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1.  Aspects  of  modelling  that  affect  optimisation 


1.1.  Introduction.  Tliis  survey  paper  has  two  main  purposes;  to  summarize  (briefly)  certain 
aspects  of  modelling  that  influence  the  performance  of  optimization  algorithms,  and  to  describe 
recent  advances  in  methods  for  nonlinear  programming  that  influence  the  solution  of  practical 
problems.  These  two  themes  arc  not  uncoimectcd.  A  well  constructed  mathematical  model 
should  be  such  that  the  bad  effects  of  ill-conditioning,  degeneracy  and  inconsistent  constraints 
arc  minimized.  Ironically,  the  pmrposc  of  good  software  is  to  deal  effectively  with  precisely  these 
problems.  Therefore  it  is  not  surimsing  that  much  of  the  insight  necessary  to  construct  a  well- 
posed  mathematical  model  is  pertinent  to  the  formulation  of  robust  algorithms. 

The  principal  problem  of  concern  will  be  the  nonlinear  programming  problem: 


NP 


minimize 

scat- 


subject  to 


F{x) 


<  tt. 


where  F(x)  (the  objective  fitnetion)  is  a  smooth  nonlinear  function,  is  a  constant  matrix  of 
constraints,  and  c(z)  is  a  vector  of  smooth  nonlinear  constraint  functions.  The  objective  function 
F  and  the  constraint  fimetiuns  taken  togcth<;r  comprise  the  problem  fimetious.  Unless  otherwise 
stated,  the  problem  functions  will  be  assurnKMl  to  be  at  least  twice-contiuuously  differentiable. 
(Methods  that  require  this  degree  of  smoothness  will  usually  work  if  there  ore  isolated  disconti¬ 
nuities  away  from  the  solution).  We  shall  use  «/{x)  to  denote  the  griulient  of  F(x),  and  a,(z)  the 
gradient  of  Ci(x).  The  solution  of  NP  will  be  denoted  by  x* 

Wo  shall  begin  with  a  statement  <if  ten  “inmlelliug  priuciph's”  tliat  may  help  to  make  the  re¬ 
sulting  probh'in  NP  more  susceptible  t^t  existing  noiilineiU'  ]>rograiiiiiiing  soft  war<'.  This  is  followed 
by  a  review  of  the  practical  aspects  of  quasi-N<rwtoii  8CM|uentinl  ipuulratic  programming  (SQP) 
methods  for  nonlinear  programiiiiug.  We  conclude  by  preiMuiting  examples  of  the  application  of 
SQP  methods  to  some  illustrative  optimization  problems. 


1.2.  Some  basic  modelling  principles.  Oiu*  observations  of  pnicticjil  optimization  problems 
have  iiidicati'd  th.nt,  even  with  the  b<'st  available  software,  tin;  I'llich'iit  optimization  of  a  model 
can  be  <'rili<'ally  de])eiideii<  on  certain  ]iroperties  of  the  foriiiulatioii.  It.  is  offt'ii  the  Cjis<*  that  the 
forinulator  of  the  model  must  make  iiuiihtous  d<Tisions  that  do  not  affert  the  nrcuracy  to  which 
the  model  reflcHrts  the  real  world,  yet  are  crucial  to  whether  the  iiiodid  is  aiueuablc  to  solution  by 
an  optimization  algorithm. 

Oiu*  experience  with  the  role  of  modelling  in  uunierical  optimization  will  be  summarized  by 
a  list  of  t«'n  “modelling  principles’' .  These  principles  may  serve  as  a  gtiide  for  those  who  have 
little  knowledge  of  the  intricacies  and  pob'iitiai  pitfalls  of  modem  optimization  cod<*s.  They  have 
bmi  derived  from  our  own  raperieiices  with  real  problems. 

Of  course,  the  nature  of  possible  nnnlels  varies  so  much  that  it  is  impossible  to  treat  all 
rehwant  aspects  of  modelling.  Tlie  main  thesis  of  tlirae  principles  is  that  dcvelo]>ers  of  models 
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should  consider  in  the  initial  stages  the  ultimate  need  to  solve  an  optimisation  problem,  since  it 
is  unlikely  that  optimization  software  will  ever  reach  the  state  wherein  a  general  routine  Cem  be 
used  with  impunity  for  all  problems. 

For  additional  nuiterial  on  aspects  of  modelling  that  influence  the  performance  of  optimiza¬ 
tion  methods,  the  reader  is  rcfiTred  to  Gill,  Murray  and  Wright  (1381). 

BASIC  MODELLING  PRINCIPLES 

I.  Formulate  a  simple  model  first  and  add  features  in  conjunction  with  running  the  optimiza¬ 
tion. 

A  model  to  be  optimized  shoidd  be  developed  by  striking  a  reasonable  balance  between  the  aims  of 
improved  accuracy  in  the  model  (which  usually  implies  added  complexity  in  the  formulation)  tmd 
increased  ease  of  optimization.  Tliis  might  be  achieved  by  invoking  an  optimization  procedure  on 
sucemsively  more  complicated  versions  of  the  model,  in  a  form  of  “stepwise”  rcfiiiemeiit.  Thus, 
the  effects  of  each  reSnemeut  in  the  model  on  the  optimization  process  can  be  monitored,  and 
fundamental  difficidties  can  be  discovered  much  more  qtiickly  than  if  no  optimization  wire  applied 
imtil  the  model  was  essentially  complete.  This  is  especially  important  when  dealing  with  models 
that  contain  many  interconnected  subsystems,  each  requiring  extensive  calculation. 

II.  Attempt  to  use  smooth  problem  functions. 

Probably  the  most  flmdaincntal  property  of  the  problem  flinctions  with  respect  to  ease  of  opti- 
inizati;)tj  is  tUffvrcutiahility,  which  is  important  because  algorithms  arc  basi'd  on  using  available 
iiifurmation  about  a  function  at  one  point  to  de«lucc  its  behavior  at  oth<T  points.  If  the  problem 
functions  are  twic«'-continuously  diffenmtiable,  say,  the  ability  of  an  iilgorithm  to  locate  the  sohi- 
tioii  is  greatly  enhtUictHl  coinpOnsl  to  the  case  when  the  problem  functions  are  uou-diffcrcntiablc. 
Then'fore,  most  optimization  soft  ware  is  designed  to  solve  smooth  problems,  and  there  is  a  great 
incentive  to  formulate  differentiable  model  fiincticms.  A  ustffiil  feature  of  methods  for  smooth 
problems  is  that  th<>y  tend  to  give  more  iufonnatiou  coiiceniing  the  quality  of  the  solution.  For 
exaijqtle,  some  nonlinear  programming  iiM>tlio<1s  can  be  shown  to  rahibit  a  superluiear  Rite  of 
ronv<*rgence  in  the  ii<'ighhorho<id  of  a  local  iiiiiiiniiiin.  If  a  method  tenuinaU's  at  a  point  for 
which  this  rnt<'  of  convergence  is  <'xhihit<Hl,  the  iumt  will  h.'ive  si>ine  ronfideiictr  th.at  tin*  final  pouit 
is  close  to  a  local  minimum.  (When  solving  a  problem  ou  a  digital  coiupiiti'r,  we  netMl  to  define 
carefully  what  we  mean  by  a  “smooth”  probkm.  hi  reality,  all  software  minimizes  a  function 
fl{F{x)),  wliich  is  the  floating-point  representation  of  F{x).  The  fimetion  fl(F{x))  is  piecewise 
constant  at  the  roiind-oiF  level.  If  we  define  to  be  the  absolute  precision  of  F,  i.e., 

|F(x)-/f(F(.))|  =  t,,(x). 

then  algorithms  for  smooth  problems  will  work  whenever  changes  in  the  variables  produce  changes 
in  F  that  are  much  greater  than  Ca  •  Note  that  the  vast  nuqority  of  optimization  software  assumes 
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that  F  is  computed  to  full  precision-,  i.c.,  it  is  assumed  that  is  of  the  order  of  Cm||^(x)||,  where 
Cm  is  the  relative  machine  precision.) 

III.  Avoid  defining  probiem  functions  that  are  the  result  of  some  iterative  procedure  (such 
as  the  solution  of  a  differential  equation  or  the  evaluation  of  an  integral). 

Problem  functions  defined  by  an  iterative!  procedure  are  often  the  source  of  subtle  discontinuities 
that  may  inqxMle  the  progress  of  the  optiniixatioii.  The  solution  of  these  subproblcms  to  fiill 
machine  precision  (even  if  possible)  generally  requires  considerable  computational  effort,  and  thus 
tends  to  be  regarded  as  unwarranted  by  the  modeller,  since  the  integral  or  differential  equation 
(or  whatever)  is  only  <ui  approximation  to  some  more  complicated  real-world  phenomenon. 

The  use  of  an  iterative  procedure  to  define  a  problem  function  most  often  occurs  when  the 
variables  of  the  problem  are  ftinctions  of  a  continuous  parameter  (in  an  optimal  control  problem, 
for  exainph-).  In  many  instances,  an  effective  strategy  for  this  type  of  problem  is  to  discretize  the 
problem  before  applying  the  optimization  method.  Accurate  solutions  to  the  continuous  problem 
me  then  found  by  refining  the  discretization  between  optimizations.  Such  a  strategy  illustrates 
again  that  it  is  often  worthwhile  to  interleave  modelling  and  optimization,  since  the  creation  of 
an  iucrcasitigly  <)ccurate  discretization  is  in  fact  a  modelling  process. 

IV.  Think  carefully  about  the  nature  of  the  constraints. 

It  is  not  always  appreciatc<l  that  substantial  improvements  in  performance  and  robustness  can 
result  when  iiu-thods  exploit  the  <liff('rent  properties  of  simple  bounds,  linear  coustniints  and 
nonlinear  constraints.  Whenever  possible,  the  user  should  isolate  the  linear  constraints  from  the 
nonlinear  constraints  and  use  software  that  differentiates  between  constraint  types  during  the 
optimization.  Unfortunately,  some  iirobleiii  formats  gnarautcc.’  that  a  linear  constraint  will  be 
treated  as  a  nonlinear  constraint.  For  example,  in  the  chiss  of  geometric  programwing  problems 
the  obj<*ctive  and  constraint  functions  are  sums  of  functions  of  the  form 

^.(a:)  =  a,i;*'a;;‘’  -1“*-,  (1.1) 

where  the  Xj  are  the  viu-iablcs  (constraiiuHl  to  lie  positive)  and  the  lure  constants.  The 
trausroriiial.ion  of  a  linear  constraint  inhi  .a  sum  of  riinctions  of  the  form  (1.1)  unnect'siuirily 
incre<uu*s  the  d<<gree  of  dilliciilty  of  the  problem. 

The  transformation  of  a  problem  from  one  form  to  another  was  often  unavoidable  in  tlic 
past  br-cause  less  softwcu-i*  was  availabhr.  When  algorithms  for  uiiconstrc-iined  optimization  were 
more  numerous  and  more  elfiM-tive  than  for  constraineil  problems,  it  was  common  practice  for 
simple  bound  constraints  to  bo  trcattnl  by  a  change  of  Viuriable.  Today,  however,  algorithms  for 
problems  with  only  siinplt!  botiiids  or  linear  coiistraiiitj*  are  comparable  hi  efficiency  to  uncon- 
8trniiif!d  algorithms.  Tlierefore,  it  is  virtually  iievor  worthwhile  to  transform  bound-constrained 
problems  (in  fact,  it  is  often  beneficial  to  add  boimds  on  the  variables  —  sec  below),  and  it  is 
rarely  appropriate  to  alter  linearly  constrained  problems. 
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IVansformations  can  be  used  effectively  to  transform  nonlinear  constraints  into  simple  bound 
constraints  (for  example,  by  using  polar  coordinates  instead  of  cartesian  coordinates  to  deid  with 
range  constraints  of  the  form  I  <  <  u).  However,  care  should  be  taken  to  ensure  that  the 

transformation  does  not  learl  to  a  new  problem  that  is  more  difficult  to  solve,  or  has  additional 
(spurious)  solutions. 

V.  Do  not  attempt  to  eliminate  equality  constraints  from  the  probiem. 

Modellers  often  assume  that  since  there  may  be  no  physical  significance  to  a  point  at  which  non¬ 
linear  equality  constraints  arc  violated,  such  constraints  should  be  satisfied  exactly  at  all  stages  of 
the  optimization.  Accordingly,  users  often  attempt  to  “eliminate”  nonlinear  equality  constraints 
from  the  problem  by  the  following  method.  The  variables  are  partitiomxl  into  “independent” 
and  “dependent”  sets.  The  minimization  is  then  performed  only  with  respect  to  the  independent 
variables,  and  the  dependent  variables  are  determined  by  “solving”  the  equality  constraints.  To 
be  more  precise,  let  x  denote  the  vector  of  dependent  variables  and  u  the  vector  of  independent 
variables.  The  constrained  problem 

minimize  F(x,  u) 

subject  to  c(x,  u)  =  0, 

is  solv«l  by  expressing  c(i,  u)  =;  0  as  x  =  T(u)  for  some  transformation  T,  and  then  minimizing 
F{T{v.),u)  with  respect  to  u.  This  strategy  is  particularly  common  when  there  arc  special  fast 
methods  for  solving  the  equations  c(z,  u)  =  0. 

W<’  do  not  recommend  this  approach  if  the  constraints  have  any  significant  degnx*  of  iionlin- 
o.-xrity.  Firstly,  it  is  difficidt  to  impose  any  simple  bounds  upon  the  dependent  variables.  S«'condly, 
the  n^ndting  algorithm  is  of  the  “c<instraint-fo1Iowing”  ty]>e,  which  will  tend  t.o  be  less  efficient 
than  other  methods.  Our  experience  is  that  the  total  compubitiouid  effort  required  to  solve  for 
the  dependent  Vtariables  at  every  trial  point  is  not  usually  worthwhile,  compared  to  expending  a 
similar  amount  of  effort  in  the  optimization  without  eliminating  the  variabl(>s.  The  user  would 
be  Ix'tter  advised  to  use  a  general  nonlinear  progranuning  method. 

VI.  Distinguish  between  "hard”  and  "soft”  constraints. 

In  iiiaiiy  prohh’ius,  (In'  hound  constraints  may  In*  rlassilied  as  c*itlier  “h;ir<r  or  “soil".  For 
exaiiijde,  lund  hounds  could  specify  the  ri'gioii  in  which  the  problem  functions  are  well-defiiusl. 
A  soft  constraint  miglit  reprc'sent  a  bound  whose  violation  we  are  prepared  to  tolerate  if  this 
resulted  m  a  large  decrease  in  the  objective  function.  In  some  cases,  variables  may  have  both  a 
soft  bomid  {ind  a  hard  bound. 

A  good  nudhod  of  treating  a  soft  bound  of  the  form  Z{  >  6^  is  to  exclude  it  from  the  simple 
bound  constraints  and  includ<*  it  in  the  objective  fuiictioii  in  the  fonii  of  a  mild  pmalty  term  of 
the  form  jp,'  |z,-  -  6,]’ ,  where  (y]_  deiiot<«  tlie  hmrtion  min{0,y}.  Mo<l«wt  values  of  pi  do  not 
lend  to  the  severe  ill-conditioning  that  occurs  with  traditional  penalty  methods.  If  the  v.arinble 
is  also  constraint'd  by  a  hard  bound,  the  constriunt  can  be  treatcnl  explicitly  in  the  usual  way. 
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As  a  general  rule,  the  presence  of  simple  bounds  simplifies  a  problem,  by  reducing  dimension* 
ality.  However,  if  a  bound  has  only  a  slight  impact  upon  the  problem  (i.e.,  if  the  objective  function 
docs  not  change  rapidly  as  the  bound  is  perturbed),  the  degree  of  difficulty  of  the  problem  may 
be  incrciiscd  b('causc  the  algorithm  needs  to  resolve  smidl  Lagrange  multipliers.  If  many  of  the 
bounds  with  small  multipliers  would  be  classified  as  “soft” ,  the  use  of  a  penalty  term  tends  to 
remove;  these  coustnuats  from  the  active  set. 

Similarly,  a  penalty  function  of  the  form  —  d<)’  can  be  used  to  produce  a  solution  that 

is  biased  towards  some  “desired”  value  d. 

Vii.  Avoid  modelling  near-dependent  equality  constraints 

Constraints  occur  in  problem  formulations  for  a  variety  of  reasons.  Often  the  very  nature  of  the 
variables  imposes  an  equality  constraint  for  example,  if  the  variables  {x,  }  represent  proportions 
or  probabilities,  this  gives  rise  to  the  constraint  =  1  (as  well  as  non-negativity  n;strictions). 
Constraints  of  this  type  arc  “genuine”  equalities,  in  the  sense  that  the  computed  solution  must 
satisfy  them  exactly  (where  “exactly”  means  “within  working  precision”).  However,  it  is  not  un- 
usmd  in  modelling  th<it  constraints  that  might  seem  initially  to  be  firm  e<iuality  constraints  should 
be  treated  instead  as  constnaints  that  need  not  be  sjitisficd  with  maximum  possible  accuracy.  For 
example,  this  situation  occurs  when  the  underlying  model  is  known  to  contain  imicruracics.  In 
some  problems,  forcing  constraints  of  this  type  to  be  equalities  may  cause  there  to  be  no  fea¬ 
sible  solution,  or  may  distort  the  properties  of  the  »>lutiou  if  the  corresponding  constraints  arc 
ill-conditioned. 

We  shall  use  a  simitlc  two-dimensional  example  to  illustrate  not  only  the  the  potential  diffi- 
cidties  associat(;d  with  dt'pendcnt  conslraiiits,  but  also  one  simple  method  for  dc;ding  with  them. 
Siqipose*  tlial.  we  n'(|uir(<  the  iiuuimum  of  a  function  of  two  variables  subjexrt  to  two  iiidepe'ndcnt 
]ine<ir  equality  coiistrciints.  In  this  ca.se,  the  solution  lies  at  the  i>oint  of  intersection  of  the  con¬ 
straints  (the  objective;  fuiictiem  h<is  ne>  influence  e>ii  the  solutiem).  He>we've;r,  suppose  that  one  of 
the;  two  ceinstraiiits  is  re;ally  a  copy  of  the  either,  but  that  elite  te>  small  erreirs  in  the  moele;lling 
preicess,  the  cemstraints  are  neit  exactly  de’iienelent.  Neiw,  the  twe>  eronslraints  are  ne'arly  paralle;! 
and  the  point  of  intc;rseTtioii  lies  at  a  point  that  may  be  very  elilfere'iit  fremi  the  sediiliem  of  the 
cssi'iilially  espiivale'iit  ]>robli<m  posesl  with  a  .single*  e’einstraint. 

A  clue  le>  eme  metheid  of  re'solution  eif  this  difficulty  lies  in  the  obse;rvatiem  that  if  e>nc  or 
the  othe*r  of  the;  constraints  were  iguore-el  eluring  the  eiptimisation,  it  woiilel  be  viola(.e‘el  by  only  a 
very  small  ipnuitity.  Suppose  that  each  depcnelent  ceiiiality  constraint  is  replaced  by  .-m  ineqimlity 
constraint  with  a  very  narrow  range.  For  e;xamplc,  the  linear  constraint  o^x  =  b  woulel  be  rcplacexl 
by 

b  -  S  <  oTx  <  &  -f 

where  ^  is  a  small,  but  ne>t  negligible,  positive  quantity.  (Exactly  the  same  transformation  can  be 
niaele  for  a  nonline’ar  constraint.)  In  the  e'xainplc  above*,  the  activevset  strate*gy  of  the  eiptimisation 
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procedure  will  decide  that  a  lower  value  of  the  objective  function  is  achieved  if  one  of  the  equality 
constriiints  is  not  satisfied  exactly  at  the  solution. 

In  Section  3.5  we  shall  use  tliis  type  of  range  constraint  in  methods  that  are  robust  on 
problems  with  dependent  constraints,  and  show  that  the  idea  of  allowing  certain  constraints  to 
be  violate<l  by  a  snudl  quantity  recurs  in  many  practical  solution  techniques.  This  is  only  to  be 
cxpwted,  since  it  is  never  possible  to  s,‘itisfy  constraints  exactly  in  finitc'-precision  arithmetic. 

Vlli.  Use  infomiation  about  the  problem  to  scale  the  variables  and  constraints. 

It  is  well  known  that  the  “right”  scaling  of  variables  and  constraints  can  dramatically  improve 
the  efficiency  and  accuracy  of  oi)timi/iation  methods.  The  scale  of  a  problem  is  the  measure  of 
the  relatives  im]>ortance  of  the  variables  juid  constraints  —  or,  equivalently,  the  scale  of  a  problem 
is  a  statc'inent  of  what  is  “Icirge”  and  what  is  “small”  in  a  problem. 

In  the  absence  of  any  other  information,  an  algorithm  will  generally  assume  that  each  variable 
or  constraint  has  equal  weight  in  the  optimisation.  For  example,  if  a  unit  change  in  a  variable 
produces  a  .small  change  in  F  compared  to  otlier  variables,  the  algorithm  will  tend  to  make  a 
larger  change  to  that  variable,  ('learly,  if  the  variables  do  have  equal  significance,  this  situation 
is  quite  sati8fa<  tory.  However,  if  the  small  variation  in  F  is  due  to  the  fact  that  the  objective 
function  is  ;Umost  independent  of  the  variable,  the  problem  should  be  rescaled.  (Note  that  in  this 
ciise,  fixing  a  variable  or  constraint  is  an  acceptable  form  of  problem  scaling.) 

The  most  common  form  of  scaling  is  to  define  new  variables  and  constraints  using  a  linear 
trcansforination.  (For  simplicity,  in  the  following  discussion  we  shall  assume  that  the  nonlinear 
constraints  are  equalities  of  the  form  Ci{x)  =  0.)  Suppose  we  define  new  variables  x  and  constraints 
c  such  that 

x  =  DiX  .and  f.  =  Djc,  (1.2) 

where  Di  .*ui<l  D2  .are  nousiugul.u'  matrices  (usually  diagonal).  With  this  sc.'iling,  the  deriva¬ 
tives  of  the  origiiual  .'uid  transAiriiic'd  objevtive  function  .are  related  by  V*F  =  DiVF  .and 
=  DiV^FDi.  Siiniharly,  for  the  constr.aints  we  have  A  =  D^A,  whore  A  and  A  are  the 
.I.acobiaii  m.atrices  of  the  original  .and  tr . an sfor lin'd  constr.aints.  Wlii'ii  Di  .and  an;  diagon.al,  an 
inl.er]>n'tat  ioii  of  tlu'  scaling  (1.2)  is  th.at  D\  .and  D2  “nuik”  tin'  eh'inents  of  the  gnulient  vector 
and  consiraiiits  so  that  e.u'h  is  of  npial  iinporfancc. 

The  important  thing  to  b(;ar  in  iiiiinl  is  th.at  a  b.adly  scale'll  problem  is  I'sse'iitially  an  ill- 
cnudJtioiml  problem.  Dadly  scaled  viuiables  Ie.ad  to  ill-conditioned  Hessi.an  m.atrices;  Inully  scaled 
constr.aints  give  rise  to  near-singular  Jacobian  matrices.  Within  optimir..ation  routines,  conver- 
gi'iicc  toleranci's  and  other  criteria  ore  necessarily  based  upon  an  implicit  ek'finitiein  of  “small” 
.anil  “large”,  ami  thus  variable's  with  widc'ly  varying  orders  of  magnitude  may  cause  difliculties 
for  senile  algorithms. 

Arguably  the  most  important  rule  of  scaling  is  th.at  the  v.ari.abli's  of  the  sc.ali'd  proble;m  should 
be  of  similar  magnitude  and  of  order  unity  in  the  region  of  inte;re»t.  If  typical  values  of  .all  the 
vnri.able8  are  known,  a  problem  can  be  transformed  so  that  the  variables  are  all  of  the  s.amo  order 
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of  magnitude.  In  this  ease,  the  transformation  £>i  of  (1.2)  is  a  diagonal  matrix  whose  elements 
arc  tyi>ic<d  values  of  x. 

An  important  property  of  a  theoretical  algorithm  is  that  of  invariance  with  respect  to  the 
scaling  (1.2).  A  scale-invariant  algorithm  has  the  property  that  when  it  is  applied  to  both  the 
origincil  problem  and  the  traiisforinod  inoblem,  the  resulting  sequences  of  iterates  {x;t}  {^k} 

satisfy  Xk  —  DiXk  for  all  k.  Tlus  property  does  not  imply  that  it  is  unnecessary  to  choose  a 
s('nsibh>  scaling  for  a  i>roblcm  when  a  scale-invariant  <ilgorithm  is  being  used.  Scniv-invariniice 
cannot  be  acixieved  iu  a  practical  implementation  of  aa  algorithm.  Not  only  is  computer  arithmetic 
not  scale-invariant,  but  also  it  is  impossible  to  devise  a  sc<de-iuvariant  test  that  an  ixlgorithm  has 
converged  to  a  point  which  satisfies  the  necessary  or  sufficient  conditions.  Miauy  algorithms 
treat  qu.uititics  that  arc  “sufficiently  small”  in  magnitude  <w  “negligible”  (in  effect,  as  zero). 
However,  .since  there  is  no  universal  d(ffinition  of  “small”,  it  is  impossible  to  formulate  a  scale- 
invariant  procedure  for  distinguishing  Ix'twcen  quantities  that  should  be  treated  as  zero,  and 
.s<';il('-dependeut  quantities  that  should  not  be  neglected.  To  illustrate  this  point,  consider  the 
aiqdication  of  any  quasi-Newton  method  that  is  scale-invariant  in  the  sense  just  (lescribed.  If  the 
algorithm  w<‘re  applio<l  to  the  scaled  juid  unsealed  problem,  we  may  regard  the  two  stiqucnces 
(ifc)  and  (ifc)  as  belonging  to  two  different  spaces.  Each  value  x*  in  the  i-spacc  is  related  to  a 
value  Xk  in  the  x-space  by  the  forinida  Xk  =  DiXk-  The  optimality  test  of  the  algorithm  must 
involve  consideration  of  the  magnitude  of  the  gradient  norm.  Howevc*r,  since  tin*  gnadient  norm  is 
tending  to  zero  with  a  <lilfereut  value  (either  ||<7fc||  or  ||i?i<7fc||)  in  each  sf>ace,  this  optinudity  test 
ciuinot  be  made  sciUivinvarimit.  Thus,  the  number  of  elements  in  each  sequence  may  b(?  dilTcrcnt. 
Furtluxinore,  the  initial  estimate  of  the  Ilessiau  is  idinost  always  defined  as  a  unit  matrix.  If  this 
is  <lone  in  both  the  i-  and  x-spaces,  tin;  .scale  invariance  is  immediati'ly  lost. 

Another  part  of  <m  .dgorithm  tlnat  is  unavoithibly  scale-dependejit  is  the  critc'rion  for  dc'ciding 
when  a  nonline.ar  constraint  of  the  form  c,(i)  =  0  is  satisfied.  Given  a  constraint  value,  say 
c,(x)  =  10  it  is  iuxes.sary  to  know  something  about  the  natunil  .scaling  of  th('  problem  in  order 
to  make  a  sensible  deci.sion  about  whetluT  the  constraint  is  sufficiently  close  to  zcto.  The  only 
way  of  overcoming  this  problem  is  to  K’t  the  us<!r  «1<H-Mle  wlmn  a  constraint  is  sullicic'utly  satisfied. 
This  hvuls  t*i  (Ik*  idea  of  ;i  user-<l<'fim’<l  fen.sihildy  tolerance  A,  that  defilK^s  wlmt  is  really  “small” 
for  each  constraint.  For  exaiiqtle,  the  part  of  the  teriuiiiatiou  crileriou  that  concerns  the  tt'st  for 
feasibility  woidd  retpiire  that  |ci(x)|  <  6i  for  eru’h  constraint. 

IX.  Try  to  provide  as  much  information  about  the  function  as  possible. 

As  a  general  nile,  algorithms  tend  to  lx;  more  successful  and  robust  when  more'  information  about 
the  problem  is  provided.  For  examide,  if  the  problem  functions  are  smooth,  algorithms  that  rise 
first  and  secoml  derivatives  perform  much  better  than  algorithms  t.hat  use  function  values  only. 

It  is  generally  perceivtxl  that  a  second-derivative  dgorithm  is  rar<'ly  useful  bix-ausc  the  cost 
of  computing  the  second  derivatives  may  be  scver.al  orders  of  magnitude  larger  than  that  of  cal¬ 
culating  first  (k'rivatives.  This  is  un<loubtedly  true  in  some  situations,  but  there  are  a  remarkably 
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large  number  of  problems  for  which  the  first  and  second  derivatives  of  the  problem  fiinctions  may 
be  obtivined  for  about  the  same  cost  —  for  example,  geometric  programming  problems,  where  the 
objective  and  constraint  functions  are  of  the  form  (11).  Morw>ver,  it  is  relatively  str.aightforward 
to  design  softwiU’c  that  will  automatically  differentiate  the  original  problem  functions. 

Other  examples  of  problems  with  clieaj)  higher  derivatives  occur  in  exponential  fitting  and 
factorabU'  programming  (for  example,  see  McCormick,  1983). 

X.  Take  special  care  to  check  that  the  problem  functions  and  their  derivatives  are  programmed 
correctly. 

Before  ombrnking  on  a  scries  of  optimization  runs,  the  user  should  verify  that  the  rode  which 
defines  the  problem  f.inctions  is  corr<H:t.  One  obvious  check  is  to  cv.aluate  the  problem  functions 
at  a  point  where  their  values  arc  known. 

Errors  in  jirogramming  the  function  may  be  quite  subtle  in  that  the  function  vjUuc  is  “al¬ 
most”  correct.  For  example,  the  function  may  be  accurate  to  less  than  fiill  precision  because  of 
the  inaccurate  calculation  of  a  subsidiary  quantity,  or  the  limited  accuracy  of  data  upon  which 
the  function  depends.  A  common  error  on  machines  where  numerical  calculations  are  u.sually 
performed  in  double  precision  is  to  include  even  one  single-precision  constant  in  the  calculation 
of  the  function;  since  some  comi)ilers  do  not  convert  such  constants  to  double  precision,  luvlf  the 
correct  figures  may  be  lost  by  such  a  seemingly  trivial  error. 

Incorrect  calculation  of  d<Tivatives  is  by  far  the  most  common  user  (iror.  Such  errors  arc 
almost  nev<‘r  small,  and  thus  no  algorithm  ctaii  perform  correctly  in  their  presence.  This  is  why 
we  recommend  that  some  sort  of  consistency  cluTk  on  the  ilerivatives  In*  performed.  The  most 
st.raiglitforwanl  iikvuis  of  checking  for  <Trors  in  tlw  d<'rivativ<’  involves  c<imparing  a  finite-difference 
a])proximation  with  tin?  sui>pos(Hlly  exact  value. 

1.3.  Some  useful  features  of  an  implementation.  In  the  following,  we  give  some  features 
of  ail  “ideal”  inqih'mentation  that  would  help  the  user  apjily  tin'  modelling  principh's  most  ef¬ 
fectively.  This  list  is  not  intended  to  Ix'  exhaustive  mnl  we  h.ave  inchnleil  some  items  in  the  list 
that  have  not  been  referri’d  to  in  tin*  text.  We  «lo  not  claim  that  it  is  possible  to  implement  every 
feature  in  all  circumstances. 

•  The  nn't.lnnl  should  treat  simple  bounds,  linear  constraints  mid  nonlinear  constraints 
separately.  Moreover,  the  method  should  effectively  deal  with  nonlinear  constraints, 
yet  renicain  competitive  with  unconstrained  or  linearly  constraiiiwl  methods  if  constraint 
nonlinearities  larc  not  present. 

•  There  should  be  the  o])tion  of  computing  the  problem  fiinctions  only  at  points  that 
satisfy  the  linear  constraints. 

•  The  software  should  print  all  the  inforination  necessary  for  the  u.scr  to  cln'ck  how 
closely  the  final  point  satisfies  the  necessary  and  sulficicnt  conditions  for  an  optimum. 

•  The  user  should  be  abh'  to  spi’cify  a  feasibility  tolerance  for  ea«’h  constraint. 
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•  It  should  be  possible  to  solve  problems  for  which  the  precision  of  the  problem  functions 
is  not  necessarily  close  to  the  machine  precision. 

•  The  user  should  be  able  to  scale  the  problem  by  specifying  the  scaling  matrices  Di 
and  D2  of  (1.2). 

•  The  user  should  have  the  option  of  chocking  the  problem  derivatives  before  starting 
the  ihiTiimisation. 

2.  Quasi-Newton  methods  for  unconstrained  minimization 

Qua.si-Newton  methods  are  iterative,  <uid  generate  a  sequence  {art}  that  is  intended  to  converge 
to  art  At  th<!  Ar-th  iteration,  the  new  iterate  is  defined  by 

a:fc+i  =  iCfc  +  ttfcPfc.  (2.1) 

where  a*  is  a  non-negative  scalar  called  the  step  length,  and  pjt  is  an  n-vector  called  the  search 
direction.  These*  methods  utilize  the  values  of  the  problem  fiinctions  and  their  gradients  at  trial 
iterates,  but  do  not  assume  the  aviiilability  of  higher  derivative  information.  (When  explicit  first 
derrivatives  <ire  not  available,  quasi-Newton  methods  can  be  impleiiM'nte'd  u.sing  linite-differcnce 
approximations  to  the  gradient  --  see,  e.g.,  Gill,  Murray  and  Wright,  1981.)  A  typical  iteration 
of  a  qua.si-Newton  method  comprises  tliroi*  related  parts:  computation  of  pk,  choice  of  Ofc,  and 
tipdating  the  necessary  matrix  factorizations. 

The  search  dinxtion  pk  in  unconstrained  optimization  is  defint'd  by  the  equation  IlkPk  =  ~ffk> 
where  Ilk  is  •'Ui  approximation  to  the  Hessian  matrix  of  F.  Many  computation.al  benefits  accrue 
frttin  updating  a  fju'torized  form  of  Ilk-  Suppose  that  where  Rk  is  ,'ui  upi>er- 

t.riangular  matrix  (the  CHiolesky  factor  of  //k).  The  search  dirtn-tion  is  then  obtained  by  solving 
two  triangular  systems; 

^fc7=-ffk  RkPk=q-  (2.2) 

The  main  purpose  of  t,he  linesearch  is  to  force  steady  progress  by  computing  a  step  length  Uk 
such  that  ^’(xk  +  «fcPk)  is  “suHiciently  k’ss”  than  F(xk);  i.e.,  the  decrease  can  go  to  zero  only  ,•« 
the  solution  is  a])proiu;hed. 

It  is  widely  {U’«'epl.<*d  (.oday  that,  (he  best  (|uasi-N<*w(ou  update  is  given  by  t  in*  UFCIS  formula: 

Hk+i  =  Hk  -  -^ykvL  (2-3) 

Vk^k 

where  Sk  =  Xk+i  -  Xk,  «fc  =  //fcSk  and  j/k  =  ffn  1  -  j/fc.  If  Hk  is  positive  definite,  ffk+i  as  defined 
by  (2.3)  will  be  positive  definite  if  and  only  if  the  a]>proximate  curvature  Vk^k  satisfies 
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DenniH  and  Schnabel  (1981)  have  shown  that  the  BFGS  update  (2.3)  may  be  expressed  as  a 
nwk-onc  update  to  Rk.  Let  ft  au<l  7  denote  the  s<-alars  («*«*.)»  and  (y*St)»  respectively.  The 
BFGS  update  may  then  be  written  as  Hk  \  t  =  where 

s  r,  T  1  ,>  11 

Itk+i  =  Rk  +  vw  ,  with  V  -  ^RkSk,  w  =  -Vk  -  -^Uk. 

P  1  P 

The  matrix  Rk,  which  is  not  ujipor- triangular,  may  be  nrstored  to  upper- triangular  form  by 
finding  an  orthogonal  matrix  P  such  that 


PRk+i  =  Rk+i, 

where  Rkn  is  upper  triangular.  Then  Hkn  —  RJuRkH  ~  Rk-i  iR^PRk^i  ~  RkuRkH’ 
required.  A  suit.ible  matrix  P  cjui  be  construcl.ed  from  two  sweeps  of  plane  rotations;  for  more 
details,  see  Gill  ct  al.  (1974). 

Since  pk  satisfies  HkPk  —  ~9k,  two  matrix-vector  multiplications  may  be  avoided  in  the 
implementation  of  the  BFGS  update  (2.3).  Substituting  from  (2.2),  we  obtain 

1,11 
v=  -q  and  w  =  -yk  +  -gk, 
a  7  CT 

where  a  =  \glp^\i  and  7  =  (y*  s*)^. 

Represi'iitation  of  Hk  by  its  Choh’sky  factors  avoids  a  serious  jiroblein  that  would  otherwise 
arise  in  quasi-Newton  methods:  the  l<»8s  of  positive-definiteness  through  rounding  errors.  With 
exact  tirithiiK'lic,  satisfaction  of  (2.4)  should  ensure  that  the  BFGS  update  gi'uerati's  strictly 
Iiositivcvdi’fiiiite  Hessian  ajiproximations.  However,  in  pnwtire  the  formula  (2.3)  c.ui  leiwl  to  a 
singular  or  indefinite  matrix  Hk  \  i-  When  Hk  is  represented  by  its  Cholesky  factorisation  and 
updates  are  perfornu’d  directly  to  the  factorization,  every  Hk  will  be  iiiniicriciiUy  positive  definite. 

Maintt'nance  of  positive-definiteness  is  considered  to  be  a  crucial  element  in  the  success 
of  qiuisi-Newton  methods  in  uuconstr.iiueil  optimization  (see  Dimnis  ;uid  More,  1977).  hi  this 
context,  the  linesearch  can  iilso  liavi-  tin*  important  function  of  guarantei'ing  that  condition  (2.4) 
holds  al.  t'vcry  iteration.  In  particular,  ni»'tli<»ds  of  .safeguarded  ]>olynomial  interjiolatioii  (sev, 
<’.g.,  Brent,  1973;  Gill,  Murray  and  Wright,  1981)  cjui  find  a  value  of  «jt  that  satisfies 

R(xk)  -  ♦-  «fcPfc)  >  -pf^kgkPky  (2-5«) 

11/(3:11  +  ^kPkfPkl  <  -nUkPky  (2-56) 

where  0  <  ft  <  q  <  I  ami  ft  <  Condition  (2.5a)  ensures  a  “suflicit'iit  diH'rease"  in  F,  mid  (2.56) 
guarantiH's  satisfaction  of  (2.4). 

A  popular  alternative  linesearch  technique  is  known  ns  backtrnckmg  (see,  e.g.,  Dennis  and 
Schnabel,  1983).  Given  a  fixed  0  <  p  <  1,  a  siHiueiice  {Pj}  is  generated  that  satisfies  ^0  =  1  Mid 
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>  p/3j.  The  value  of  is  taken  as  the  first  element  in  the  sequence  {Pj}  satisfying 
the  sufficient  decrease  criterion 

^(xk)  -  F{xk  +  fijPk)  >  -pPjglPk, 

wliere  0  <  /x  <  1.  Since  a  backtracking  method  can  never  generate  a  value  of  a  greater  than 
unity,  (2.4)  may  not  hold.  Most  iiiiplenicntations  of  unconstrained  quasi-Newton  methods  with 
a  backtracking  linoscarch  simply  skip  tlu;  updat<;  in  this  case.  (As  the  iterates  converge,  (2.4)  is 
satisfied  for  the  inithd  step  of  unity,  and  hence  the  difficulty  doc's  not  arise  in  the  limit.)  The 
simplicity  of  bfurktracking  algorithms  and  their  utility  in  convergence  proofs  have  led  to  their 
frequent  appearance  in  the  literature. 

3.  Methods  for  nonlinear  equality  constraints 

3.1.  Basic  theory  and  notation.  In  this  section,  we  consider  methods  for  problems  that 
contain  only  nonlinear  equality  constraints,  i.c. 

NEP  minimise  F(x) 

subject  to  Ci(x)  =  0,  »  =  1, . . . ,  m. 

We  concentrate  on  this  simplified  problem  in  order  to  emphasise  the  treatment  of  nonlinear 
constraints. 

The  Kuhn-Tuck<T  conditions  for  NEP  state  the  existence  of  an  rn-vc'ctor  A*  (the  Lagrange 
multiplier  vector)  such  that 

g(x*)  =  A(xY^*  (3.1) 

(For  a  (h'tailed  discussion  of  first-  and  second-order  Kuhii-Tucker  conditions  for  optiiuidily,  see, 
for  exanij)le,  Fisicco  and  McCormick,  19G8,  and  Powell,  1974.) 

Let  Z(x)  xh'iiote  a  matrix  wluise  coliinuis  form  a  basis  for  the  st't  t)f  vox-tors  orthogonal  to 
tlie  rows  of  A{x);  i.e.,  A(x)Z(x)  =  0.  Aji  equivalent  statement  of  (3.1)  in  Utius  of  Z  is 

Z(x)'^g{x)  =  0. 

The  vtH  lor  Z(x)'^ij(x)  is  terinexl  the  proy'cti'd  gradix'iit  of  F  at  i. 

The  Lagrtutgiim  fuuction 

L{x,/t)  =  F(x)  -  /i’’c(i), 

where  p  is  on  m-vcctor  of  Lagrange- multiplier  estimates,  plays  an  important  role  in  understaiuling 
and  solving  constraiiunl  problems.  Condition  (3.1)  is  a  statement  that  x  is  a  stationary  point 
(with  respect  to  *)  of  the  Lagrangian  function  when  p  =  A*.  One  of  the  second-order  sufficiency 
conditions  for  o]>timality  is  that  the  projccU'tJ  Hessian  of  the  Lagr/wgian  hinetion, 

Z(x)^VH(x,p)Z(x)  =  Z(i)^(vV(*)  -  f;p.V»c.(z))^(*), 

t=i 
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is  positive  (lefiuite  when  x  =  i*  ^  =  A*. 

hi  the  following,  we  consider  acqiwntial  quadratic  programming  (SQP)  methods  for  NEP, 
in  which  the  search  direction  is  the  solution  of  a  quadratic  progrsuiimiug  stihprohlcm  and  the 
stepleugth  achieves  a  sufficient  reduction  in  some  “merit  function”.  The  purpose  of  the  merit 
function  is  to  enforce  steady  progress  to  the  solution  by  bcilancing  the  (usually)  conflicting  aims 
of  reducing  the  objective  function  and  satisfying  the  nonlinear  constriunts. 

The  <iu<ulratic  programming  subproblcm  is  of  the  form: 

niimmisc  ff^p  +  (3.2a) 

subject  to  Akp  =  —Ck,  (3.26) 

where  c/t  and  Ak  denote  c  and  A  evaluated  at  x*.  The  so-called  liaearixod  constraints  (3.26)  repre¬ 
sent  a  first-order  approximation  to  the  nonlinear  constraints  of  the  original  problem.  The  matrix 
Hk  i.s  an  ai>proximation  to  the  Hessian  of  the  Lagraugian  function.  The  Lagrange  multiplier 
vector  of  this  subproblcm  (denoted  by  pk)  satisfies 

ffkPk  +  fffc  =  Alpk^ 


and  may  be  used  as  an  estimate  of  A*. 

SQP  methods  differ  in  their  definitions  of  the  matrix  Hk,  nnd,  as  we  shall  sec  later,  formula¬ 
tion  of  the  <JP  constraints  (3.26).  In  the  next  two  sections,  we  shall  see  how  the  cboicc  of  matrix 
Hk  is  related  to  tin?  method  used  to  solve  the  eciuality-constraiiit  QP  (3.2). 

3.2.  Methods  for  equality-constraint  QP.  All  im'thods  for  solving  (3.2)  may  be  viewed  as 
idternative  iiu’thods  for  .solution  of  the  auginciit4^1  sy.stojjj  of  erjuations  for  p  and  p 


which  expresses  f  lic  optimality  and  fe.isibility  conditions.  (The  subscript  k  has  Ixvn  suppressed 
for  coiiveiiK'iice.) 

Mef.hod.s  for  solving  (3.3)  are  ofi.en  lutsed  upon  constructing  an  (sinivalent,  but  simpler, 
8y8l.eiii.  Ls’t  S  be  a  uonsiuguhu'  (n  -f  m)  X  (n  -f  Tr»)  matrix.  Tin*  solution  of  (3.3)  is  wiuivalent  to 
the  solution  of 


W<*  shall  consider  two  cotiiiiionly  usc<l  choices  for  iV  rlerived  from  the  LQ  factorisation  of  A: 
an  mx  m  lower-trimigular  matrix  L  and  an  n  x  n  matrix  Q  sucli  that 
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Assume  that  the  coluums  of  Q  aic  partitioned  so  that 


g  =  (r  z), 

where  Y  has  m  columns.  Then  let  5  be  given  by 


(3.6) 


cuid  let  Py  and  pg  denote  the  first  m  and  last  n  -  m  elements  of  p,  resp<x; lively.  Substituting 
from  (3.6)  into  (3.4),  we  obtain 


ryTHY  Y'^HZ 
Z'^HY  Z'^HZ 


(Y’^g\ 

<  c  / 


Thus,  p  and  p  may  be  found  by  solving  the  equations 


I4>Y  =  -e.  (3.7o) 

Z'^HZpt  =  -Z'^g  -  Z'^HYpy  (3.76) 

p  =  Ypr  +  Zpt  (3.7c) 

L'^p  =  Y'^{g+Hp).  (3.7d) 


Note  that  the  projected  Hessian  matrix  Z'^HZ  appears  explicitly  in  (3.76).  If  (3.2)  has  a  well- 
defined  solution,  this  matrix  is  positive  definite. 

W<'  consid(;r  two  definitions  of  L  and  Q.  In  the  methods  of  Cill  and  Murray  (1974),  Wright 
(1976),  Murray  and  Wright  (1978)  and  Gill  vt  nl.  (1984c),  L  and  Q  are  found  by  explicitly 
triangnlari/.ing  A  using  Householder  inat.rires  or  st.{ibilir.ed  elementary  iiicatrici's.  In  this  paper, 
we  consider  only  the  us<?  of  Householder  matrices,  in  which  case  the  matrix  Q  is  orthogonal. 

Computation  of  tlu'  LQ  f.-irtorixation  may  be  viewt^l  ns  iipdatiug  an  existing  fiu’torisation  as 
new  rows  are  ,'u1<1<h1  in  the  1,'ist  position.  Assume  that  the  LQ  fiU’toriKation  (3.5)  of  A  is  available, 
and  consi<ler  the  matrix  A,  which  is  A  augiiu'iiU'd  by  the  row  aT.  Then 

(t*'  s’*)’ 

where  t  and  n  ivrc  the  relevant  pnrtiti«)ns  of  Q^a.  L<'t  Q  denote  a  Hous(?holdcT  matrix  of  the  form 

Q  =  I-  iuii’’, 

where  the  vector  u  and  scalar  0  are  chosen  to  annihilate  ail  but  the  first  cleinent  of  s,  and  to 
leave  t  unchanged.  (For  details  of  how  these'  qmuititics  arc  dofiiUHl,  see  Stewart,  1973.)  Then 


or  AQ  =  {L  0  )  ,  where  Q  =  QQ. 
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The  80-callo<l  “standard”  LQ  factorization  is  a  version  of  (3.8)  and  (3.9)  in  which  the  rows 
of  A  are  added  to  the  null  matrix  one  by  one.  The  initud  Q  matrix  is  taken  as  the  identity, 
jmd  the  init  i<d  L  is  the  null  matrix.  While  computing  the  factorization,  the  sequence  of  House¬ 
holder  transformations  is  stored  in  compact  form  (i.e.,  Q  is  not  stored  explicitly);  the  vector  oTQ 
needed  in  (3.8)  is  obtained  by  applying  the  sequence  of  stored  transformations.  Once  the  initial 
factorization  1ms  been  compU'tcd,  the  necessary  explicit  matrix  Q  is  obtained  by  niultiplymg  the 
compact  Hotiseholder  transformations  together  in  reverse  order. 

A  second  choice  for  the  matrices  L  and  Q  involves  defining  Q  so  that 

Q'^HQ  =  I.  (3.10) 

hi  this  case,  Z'^HZ  =  /  and  Z'^HY  =  0,  and  the  equations  (3.7)  for  p  and  ft  become 

Lpr  =  -c  (3.11a) 

P2  =  -Z'^'g  (3.116) 

p  =  Ypr  +  Zpg  (3.11c) 

L'^ft  =  Y'^g  +  Pr-  (3.11d) 

The  recurrence  relations  (3.8)  and  (3.9)  may  bo  adapted  to  compute  Q  satisfying  (3.10)  by 
defining  the  initi<U  Q  matrix  to  be 

Equations  (3.11)  are  u.s<?d  to  solve  the  augmentoil  system  in  the  QP  method  of  Goldfarb 
and  Idnani  (1983).  Similar  techniques  have  bwn  suggi'stcMl  previou.sly  for  both  positive-definite 
and  iudi  finite  (piadratic  progrfUns.  In  the  latter  c.aso,  the  relationship  Z'^H  Z  =  D  in  maintained 
instead  of  (3.10),  where  tin;  matrix  D  may  be  diagonal  (Murray,  1971)  or  block-diagoiud  (Dunch 
and  Kaufman,  1978). 

3.3.  Properties  of  the  SQP  search  direction.  It  is  clear  from  (3.7c)  that  the  search  direction 
is  th<’  sum  of  two  vectors:  a  rangi'-space  component  p„  (=  Ypy),  and  a  null-simce  component  p^ 
(=  Zpx).  The  range-si)ac<’  vector  satisfies  the  underdel.erniined  equations  (3.26),  ainl  thus  defines 
a  stej)  to  tin*  linearized  versions  of  the  nonlinear  constraints.  (If  the  columns  of  Y  are  orthogonal, 
p„  tlefiiu's  the  step  t,o  the  iiotirrst  point  on  the  linearized  constniints.)  The  null-sp<u'e  compo¬ 
nent  p^  dc'lines  the  st  ep  from  Xk  +  Pt,  to  the  ininiinuiii  of  the  (piadratic  niodi'l  of  tlu'  Imgraiiginn 
function  in  the  siib.space  orthogonal  to  the  constraint  normals.  An  ('xjilicit.  distinction  between 
the  aims  of  satisfying  tin'  constraints  and  niiniinizing  tin'  Lagrangiau  function  is  important  be¬ 
cause  the  projiertics  of  the  equations  that  define  the  associaUnl  vectors  are  essentially  different. 
The  range-space  vtx’tor  is  a  Newton  step  in  the  sense  that  it  is  computi'd  using  ('xact  derivative 
information.  Dy  contrast,  the  null-simce  component  is  a  gihisi-Newtoi:  step  (h'iiued  using  ap- 
])roximat('  derivative  information  from  //*.  The  better  JU'cnracy  of  tin'  rauge-sp.u'e  component 
iin]>lies  that  ||c||  generally  remains  sinalh'r  than  ||^^i7||  the  solution  is  ajijiroached.  During  the 
fin<d  iterations,  the  behavior  of  SQP  methods  is  charact.('rized  by  the  n'lationship 
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i.c.,  the  final  search  directions  lie  almost  wholly  in  the  null  space  of  A. 

3.4.  The  definition  of  Hjb.  Based  on  its  success  in  the  unconstrained  case,  the  BFGS  formula 
(2.3)  seems  a  logicc-d  choice  for  updating  an  approximation  to  the  Hessian  of  the  Lagrangian 
function.  However,  the  definition  of  th(^  updating  formula  in  the  constrained  c^lse  is  complicated 
by  the  f<u:t  that  there  is  some  choice  <i8  to  which  matrix  should  be  approximated. 

An  important  feature  of  the  BFGS  update  in  unconstrained  optimisation  is  the  maintenance 
of  i)o.sitivc-tlefinitcness.  Li  the  constrained  case,  the  relevant  positive-definite  matrix  is  the  pro¬ 
jected  Hessian  Z^IIZ.  Accordingly,  the  first  class  of  methods  that  we  shall  consider  is  based 
on  inaiiitaining  a  quasi-Newton  approximation  Hm  to  the  proj<*cted  He.ssian.  There  arc  many 
closely-related  vmiaiits  of  this  approach.  For  example,  for  finearly  constrained  problems,  see  Gill 
tUid  Murray  (1074)  and  Murtagh  and  Saunders  (1978).  For  noulinemly  constrained  problems,  sec 
Murray  ami  Wright  (1978),  Coletmm  and  Conn  (1982),  Gabay  (1979),  and  Nocedal  and  Overton 
(1982).  A  typical  update  fur  methods  in  this  class  is 

=  -h  -^y*yr.  (3-13) 

where  barn'd  quantities  refer  to  the  updated  values,  Qj.  —  Z^g,  Vz  =  9z  -  Oxt  =  Z^(x  -  x) 
and  Uz  =  Hz»z-  For  these  projected  qiuisi-Newton  methods,  the  matrix  //  that  defines  the  QP 
subproblem  (3.2)  is  ZH^Z'^,  which  is  positive  stmii-definite.  A  common  feattire  of  the  projoctcxl 
quasi-N<!Wton  nu'thuds  mcntionetl  above  is  that  the  uull-sp.'u'e  compommt  of  the  scvu'ch  dir<x;tion 
is  defined  from  the  expiations 

f^zPx  =  ~gx 

in  ordi'r  to  avoid  the  n<;cessity  of  nxuirriiig  the  matrix  Z'^HY  (cf.  (3.76)). 

In  contrast  to  the  uncoustraiiuHl  ciuse,  it  is  not  always  possible  to  choose  a  step  length  that 
guar.'uitix's  the  condit  ion  >  0.  As  a  result,  there  may  be  iterations  in  whicii  Uu'  qmisi-Newton 
update  c.'uniot  be  performcil  b<x:au8e  of  loss  of  positivivilefiniti'iiess.  However,  since  (3.12)  implies 
that  the  si'arch  directions  will  lie  in  the  null  space  of  A  as  the  itc>rat<\s  conviTge,  is  typically 
]>ositive  in  the  neighborhood  of  the  siilution. 

All  iuimediati'  consequence  of  storing  only  an  approxiniation  to  the  projectetl  Hessian  is  that 
the  QP  inulti]>liers  ft  canniit  lx*  computed,  since  lh<?  full  matrix  //  is  nut  availabh’  (cf.  (3.7r/)). 
Howi'ver,  if  Q  is  orthogonal,  the  kvust-squares  nuiltipliers  A|.  at  a  point  x  may  be  calculated  from 
th<>  ixinations 

=  Y'^g{x). 

The  continuity  pro])erties  of  the  nssoriat.ed  Z  are  significant  in  projected  (pnusi-Ni'wton  meth¬ 
ods  bw;aiise  Z  defines  the  operation  of  proji*ctioii.  Fur  example,  in  proving  local  convergence  for 
algorithms  that  explicitly  utilise  Z,  it  is  essential  that  simdl  changes  in  x  should  lead  to  small 
changes  in  Z  (sei*,  e.g.,  Coleman  and  Conn,  1982a,  b;  and  Nocedal  and  Overton,  1982).  The 
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standard  method  of  computing  the  IXj  fac'torization  does  not  provide  a  continuous  representa¬ 
tion  of  Z(x)  (see  Coleman  and  Sorcnstui,  1984).  How<;v('r,  a  continuous  representation  of  Z  may 
be  defined  using  a  trivi<U  inudificatio}i  of  the  relations  (3.8)  and  (3.9).  Rc'call  that,  given  tlic 
factorization  (3.5)  of  the  matrix  A,  we  require  the  factors  L  and  Q  of  A  at  the  point  i  =  x  +  ap. 
Exactly  as  in  the  standiird  fa<!torization,  (3.8)  and  (3.9)  can  be  used  to  update  the  factors  as 
the  rows  of  A  arc  added  to  the  null  matrix  one  by  one,  except  that  the  explicit  matrix  Q  from 
the  previous  QP  suhprohU'm  is  taken  as  the  initial  matrix  Q  in  (3.8).  Eatdi  Hoiisehohlcr  tr£ms- 
formation  is  then  multiplied  into  Q  <ifter  the  corresponding  row  has  been  transformed.  With 
the  standard  procedure,  the  Householder  transformations  can  be  stored  in  compact  form,  but 
with  this  approat^h,  Ccich  new  row  of  A  must  be  transformed  by  a  full  orthogonal  matrix  (which 
is  somewhat  more  expensive  unless  some  of  the  constraints  arc  liuefU*).  The  benefit  is  that  Z  is 
continuous  at  a  point  when*  A  Ims  full  rank,  <uid  appro<u:hes  a  limit  when  computed  at  a  sequence 
of  points  {x«;}  converging  sufHcicntly  fast  to  a  suitable  ]>oint  x  (sen;  Gill  et  al.,  1985a). 

The  second  major  representation  of  Hk  is  as  a  quasi-Newton  approximation  to  the  full  Hessian 
of  the  Lirgrangimi  function.  (This  method  is  cspcx-ially  appropriate  if  the  calculation  of  the  search 
direction  requires  the  full  Hessian,  e.g.,  the  method  defined  by  equations  (3.11).)  Consider  a 
BFGS-typc  update  of  the  form 

H  =  H - L  (3-14) 

urs  v^a 

where  a  =  x  -  x  and  u  —  Ha.  Since  //  is  meant  to  €ai>proximat<*  the  Hessum  of  the  L<’\grangian 
function,  a  “natural”  choice  for  v  would  be  t^,.,  the  difference  in  gradicuits  of  the  Lagrangian 
function;  i.e., 

Vi.  =  0  -  9  ~  -  A'^)X, 

with  A  tak(?n  as  the  best  available  niultii>licr  estimate.  However,  since  x  is  not  an  unconstrainerl 
minimuin  of  the  Lagriuigiau  function,  it  may  l>e  impossible,  with  miy  liiu'scan  h,  to  find  a  step 
length  ftir  which  yjTs  is  positiv(;.  Hence,  the  update  might  be  skippe<l  ,-it  every  iteration,  which 
would  not  only  (h’stroy  llu'  locjil  convergence  properties,  but  also  advtTsely  affwt  the  ellicicncy 
of  tlu'  method  aw<iy  from  the  solution. 

A  ]>opular  uiel.ho<l  for  d«*<iliug  with  this  diiliculty  is  to  use  y,_  ,‘is  v  in  (3.14)  only  when  yj^a 
is  siiflicieiitly  positive;  otherwise,  w  is  taken  as  a  perturlunl  vc-ctor  y^  such  that  yja  >  0.  A 
p«rttirb.ation  that  we  have  found  to  be  quite  successful  in  pnvctice  is  defined  ns  follows.  When 
yj’s  <  0,  compute  the  scalar 

w  = _ _ . 

-’’(A^c-A’c) 

The  (]uantity  in  the  denominator  is  an  a]>]>roxiiiiat.ioii  to  (he  curvature  of  Hcjlj,  which  is  positive 
at  X* .  If  (i;  is  negative,  the  update  is  skipped;  otherwise, 
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whore  uf  is  <'U)y  positive  scalar  such  that  cj  >  ui.  The  motivation  for  this  choice  of  v  is  tlic  result 
(due  to  Boggs,  Tolle  aud  Wang,  1982)  that  a  ii(*ccssary  condition  for  ^-stiperlincor  convergence  is 
that  the  approximate  Hessian  matrices  must  satisfy 


lim 

fc— *oo 


l|Z*zn/rfc-V»L(/,A*))pfcl| 

llPi^ll 


=  0. 


3.5.  Inconsistent  and  ill-conditioned  constraints.  In  the  prcceduig  discussion  of  SQP 
methods,  we  httve  assumed  that  the  equations  (3.26)  arc  consistent.  When  till  the  constraints  of 
the  original  problem  are  linear,  th.,  subproblem  constraints  can  be  inconsistent  only  if  the  original 
problem  has  no  solution.  With  iionliiie.u'  constrauits,  however,  the  constr/unts  of  the  suhprobicm 
may  be  inconsistent  even  when  the  original  problem  has  a  well-posed  solution.  Tecliniques  for 
dealing  with  inconsistent  constraints  fall  into  two  major  categories.  In  both  cases,  the  search 
dir(H:tiun  satisfies  a  shifted  set  of  linear  constraints 

Ap  =  r-e.  (3.15) 


The  first  approach  generates  a  search  direction  designed  to  minimise  a  weighted  combiitation 
of  the  qtiadratic  approximation  to  the  Lagraugian  and  the  residual  vector  of  the  unperttirbed 
constraints  (the  vector  c  -I-  Ap).  The  compromise  is  achieved  by  choosing  p  as  the  solution  of  an 
niK-onstrained  problem  of  the  form 


ininiiui7.c  o{j/^p  +  ^p^ffp)  +  $(c  +  Ap), 


(3.16) 


where  $(r)  is  a  sc<dar-ViUued  function  that  measures  the  “size”  of  r,  and  a  is  a  non-negative 
scahar.  The  theon^ticiU  Inisis  of  tliis  class  of  niethf>d  is  that  a  iiiiniiiiuin  of  (3. 1C)  always  exists, 
even  wlnui  a  solution  of  tin*  QP  sidiproblem  (3.2)  <loes  not. 

The  {ilgorithnis  of  Diggs  (1972a),  Dartholoiuew-Diggs  (1982)  and  Fletcher  (1981)  c<)rn?si)ond 
to  choosing  $  as  the  two-nonn  ami  one-noriii,  rcTspictively.  In  »nt.her  cjwc*,  it  can  be  shown  that 
the  nnconstriiined  niinimiser  of  (3.16)  is  th<<  s«dutioii  of  .a  QP  with  ohjtrtive  function  (3.2a)  iuid 
constraints  (3.15).  The  foriii  of  the  v<rtor  r  depends  on  the  ileiinition  of  '!>.  If  «l>(r)  is  tleliiual  ns 
^||r||^,  all  the  compoiu'nts  <if  r  an*  gt'iierally  iie.i-/.eru,  aud  p  is  diitimal  by  (filiations  similar  to 
(3.11)  (se<'.  BmrtholoiiK'w-Diggs,  1982,  for  more  details).  If  ^(r)  =  ||r||i,  p  is  the  solution  of  an  (i- 
QP  (s<H’  Fletcher,  1981).  At  the  solution  of  this  problem,  a  subset  of  the  original  constraints  (3.26) 
will  be  satisfied  <;x<'u;tly  (i.e.,  the  corres]>onding  components  of  r  will  be  m'ro).  This  approach 
therefore  implicitly  “discards”  some  of  the  violaUal  constraints  from  the  snbproblcin. 

TIk'  methods  of  Dart  holomew-Diggs  and  Fletcher  an*  based  on  the  propert  ies  of  penalty  func¬ 
tions  (s<H?  FletchcT,  1983,  for  a  survey).  A  feature  of  th(*8c  triNatiiU'iits  of  inconsisti'iit  constraints 
is  that  a  is  always  non-7.cro  in  (3.16).  An  alternative  approach  is  to  define  the  composite  function 
only  wlmii  the  constraints  are  found  to  be  inconsistent.  Other  SQP  met  hods  with  a  strati^gy  of 
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this  type  have  been  suggested  by  Powell  (1977),  Schittkowski  (1983),  Tone  (1983)  and  Gill  et  al. 
(1984b). 

The  stM'ond  technique  for  the  treatment  of  inconsistency  is  based  on  the  observation  that 
only  the  range-space  portion  of  the  QP  seiU'ch  direction  is  ill-defined  when  the  constrciints  arc 
incompatible.  With  this  approach,  a  well-defined  procetlurc  is  used  to  compute  p,„  and  pz  and  p 
are  obtained  from  (3.76)  luid  (3.7c).  The  most  strmghtforward  application  of  this  approach  is  to 
define  p„  as  a  solution  of  the  linear  least-squares  problem 

minimise  ||c -1- /Ipnljj,  (3-17) 

PR 

which  gives  r  as  the  smallest  constraint  shift  (in  the  least-squares  simse).  This  choice  of  p^ 
is  equivalent  to  computing  the  first  iter.ate  of  the  Gauss-Newton  method  for  minimising  the 
two-norm  of  tin;  uonlincur  constraint  violations.  Therefore,  the  strategy  for  dealing  with  the 
constraints  has  changed  from  the  possibly  niisolvablc  problem  of  finding  a  point  such  that  c(x)  =  0 
to  the  always  solvable  problem  of  minimising 

Since  A  must  be  rnnk-d<>ficient  when  the  constraints  arc  inconsistent,  the  solution  of  (3.17) 
is  not  nniciuc.  A  suitable  choice  of  p„  in  this  case  is  the  minimum-length  solution,  which  can  be 
computed  u.sing  the  complete  orthogonal  factorization: 


where  I'  and  Q  arc  orthogonal  matrices  mid  L  is  a  lower-triangul.u'  matrix  whose  dimension  is 
I’qual  to  the  rank  of  A. 

Unfortunately,  neither  of  these  ti'chiiiqiKis  resolvt's  the  difficulties  caused  by  constraints  that 
are  almost  inconsistent  (i.i\,  ill-condition«'d).  Bi-conditioning  in  A  will  tend  to  cause  p„  to  bo  large 
in  norm  (see  (3.7a)).  In  thes(<  situations  it  is  lu'ci'ssiu'y  in  pnactiiT  to  limit  the  norm  of  p.  It  might 
appeiu*  that  the  least-length  solution  of  (3.17)  would  automatically  be  satisfactory.  However,  the 
coiiquitation  of  p  using  tlu*  complete  orthogonal  factorisation  involvc's  serious  priu  tical  dilRculties 
in  particuliu',  a  strategy  must  be  included  for  I'stiniating  the  rank  of  A.  It  is  well  known  that 
the  di'fiiiitioii  of  nuiiierical  “riiiik”  is  jirohh'ni-«lep<'iuIent.  Th«*  rank  ran  never  In*  di'tenniiu'd 
without  making  an  exjilicit  judgnient.  about  si-aliiig,  i.e.,  a  di'cision  as  to  which  (luantit.iim  can  be 
cf'nsiilen'il  “negligible’’.  The  choice  of  rank  is  critical  in  the  Gauss-Newton  method  because  a 
slight  idteration  in  the  valiu;  of  the  tol<?rancc  used  to  estimate  the  rank  may  lead  to  completely 
different  behavior. 

If  the  composite  function  (3.1G)  is  used,  an  explicit  bound  on  the  norm  of  p  may  be  enforcc'd 
by  temporarily  imposing  a<1<litionnl  constraints  <»ii  the  problmn.  (This  type  of  prociMlnre  is  used 
within  trnst-n'gion  {ilgorithms  for  unconstrained  optiini7.ntion.)  The  effect  of  the  trust-region 
constraints  is  to  modify  (iin])licitly  or  explicitly)  the  derivative  inforimition  that  dt'fines  the  search 
direction.  For  <*xnmple,  if  a  temporary  bound  is  placed  on  the  two-norm  of  p,  the  search  <lircction 
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satisfies  equations  in  which  the  second-derivative  approximation  is  modified  by  a  multiple  of  the 
identity  matrix.  Thus,  Z'^IIZ  and  A  arc  implicitly  modified  -  an  mifortunatc  result,  since  we 
would  prefer  the  projected  Hessian  approximation  to  be  independent  of  the  conditioning  of  A. 

The  development  of  stable  robust  methods  for  dealing  with  ill-conditioned  constraints  is 
still  an  active  area  of  rcserurch.  One  possible  approach  was  suggested  in  Section  1.2,  where  small 
perturbations  of  constraints  were  used  to  resolve  inconsistencies  caused  by  modelhiig  in;iccuracic3. 
By  changing  the  constritints  (3.2b)  to  suitable  perturbed  inequality  constraints,  (3.2)  always  has  a 
bounded  solution.  For  example,  consider  defining  pn  as  the  solution  of  the  inequality-construed 
quadratic  program 

minimise 

PKfcSl" 

subject  to  —S  <  Ap„  +  c  <  S,  (3.18) 

where  £  is  a  vector  of  small  quantities  that  are  forced  to  approach  the  feasibility  tolerances  for 
the  original  constraints  as  x  approaches  x.  The  subset  of  constraints  active  at  the  solution  of 
(3.18)  may  then  be  used  to  define  Z  and  Y,  from  wliich  p^  and  p  can  be  computed  using  (3.7). 

4.  Methods  for  nonlinear  inequality  constraints 

4.1.  Background.  In  the  final  problem  to  be  considered,  all  the  constraints  are  uonliitear 
inequalities: 

NIP  minimise  F(z) 

xcR" 

subject  to  c<(i)  >0,  »  =  1, . . .  ,m. 

We  consider  this  simplified  form  in  order  to  concentrate  on  the  treatment  of  nonlinear  iiuxiiic’ility 
constraints. 

Let  the  matrix  A(x)  denote  the  Jacobian  of  c(x).  The  constraint  c,-  is  Sciid  to  be  active  at 
X  if  c,(z)  =  0,  ainl  violated  if  c,(i)  <  0.  The  Kului-Tuckt'r  conditions  for  NIP  are  similar  to 
those  for  the  tMju.'ilily-coiistriiint  case,  <'xc<<{>t  tli.at  tlmy  involve'  itiily  constraints  .u'tive  .it  x,  ami 
iui)>ose  a  sign  r<*strictiou  on  tlur  Ij<igraiig<'  iiiultipliers.  The  iiitijor  dilk'n'iice  betwtvn  iiuHimility- 
aiid  <Hiunlity-cunstr<unc<l  problems  is  that  the  s<'t  of  constraints  active  at  the  solntioii  is  unknown 
in  the  ineqmility  case.  Thm?forc,  algorithms  fur  NIP  must  include  a  procedure  (termed  an  active- 
set  strategy)  that  <lctcrmine8  the  correct  active  set  —  usually,  by  maintaining  a  working  set  that 
estimates  the  active  set.  In  this  section  we  discuss  the  mlditiuiud  idgorithmic  complexity  in  SQP 
methods  that  arisi's  spc'cifically  from  the  presc'iice  of  ine«|unlity  constraints. 

4.2.  Formulation  of  the  QP  subproblcm.  Oroailly  speaking,  two  extreme  tyjxw  of  QP 
subproblems  can  be  posed  when  solving  inequality-constrained  problems.  The  first  —  called  an 
JQP  strategy  —  corresponds  to  representing  aii  nonlinear  inequality  constraints  ns  ine(]iiaiitics 
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in  the  QP  subproblcm;  this  has  been  by  far  the  most  widely  used  formulation  in  published  SQP 
methods.  The  standard  IQP  search  direction  is  the  solution  of 


minimise 

pe»“ 

subject  to 


o’?  +  5P’’^P 
Ap  >  —c. 


(4.I0) 

(4.16) 


where  H  is  mi  approximation  to  the  Hessian  of  the  Lagrangian  function  and  A  is  the  Jacobian  of 
c(x)  cvjJuated  at  the  current  iterate.  In  general,  the  solution  of  (4.1)  must  be  found  by  iteration. 
Thus,  the  structure  of  an  SQP  method  with  an  IQP  strategy  involves  major  and  minor  iterations 
-  -  the  minor  iterations  being  those  of  the  quadratic  prognunming  method. 

Because  (4.1)  includes  all  the  constraints  of  NIP,  it  is  convenient  to  take  the  active  set  of 
the  QP  as  a  prediction  of  the  active  set  of  the  nonlinearly  constrained  problem.  Tin;  theoretical 
justification  for  this  strategy  is  that  the  QP  (4.1)  will  make  a  corrcHrt  prediction  of  the  active 
sot  of  the  original  problem  in  .a  neighborhooil  of  x*  for  any  bounded  positivivdefinite  matrix  H 
(Robinson,  1974).  Furthermore,  tl  e  multipliers  of  (4.1)  approach  the  multipliers  of  NIP  as  the 
iterates  converge  to  x,  and  hence  it  is  common  to  take  the  QP  multipliers  as  the  next  multiplier 
estimate. 

The  second  extreme  form  of  subproblem  in  SQP  methods  involvc's  a  QP  with  only  cr/nality 
constraints.  In  ord<?r  to  use  an  EQP  strategy,  some  determination  must  bo  made  Indore  posing 
the  QP  as  to  which  constraints  arc  to  be  includeil.  An  EQP  method  should  have  lh<'  property 
that  it  will  seK'ct  the  cornH-t  active  set  in  some  n(‘ighborhood  of  x*  Therefore,  such  methods 
tend  to  t'lioosc  constraints  that  satisfy  properties  of  the  active  constraints  in  a  neighborhood  of 
the  solution  e.g.,  are  “siiiair  in  inagnitudc,  or  satisfy  the  sign  re(|uireinents  for  the  Lagrange 
multiplier  ("stimates.  A  benefit  of  an  EQP  method  h.  that,  in  general,  (he  subprublem  will  be 
easier  to  solve  th<ui  one  with  inequality  constraints. 

To  a  large  extent,  the  active-set  strategy  will  determine  the  choice  of  quasi-Newton  update 
and  Lagrange  niidtiplier  c'stiniate.  For  example,  if  an  IQP  strategy  is  list'd,  the  methotl  used  to 
solve  (4.1)  will  require  specification  of  the  full  malrix  //.  On  the  other  hand,  iui  EQP  strategy 
is  usually  iinplenientt'tl  with  an  approximation  to  llie  projectt'tl  Hessian.  Tin*  following  Uible 
snminari/.es  the  major  features  of  the  two  tictivt'-sei  strategies. 


EQP 


IQP 


QP  subprublem: 

g^p  +  ^p^JIp 


QP  snb])roblcra: 


numiniisc 

PC«" 


minimise 

pcR" 


g^p  +  ^p^ffp 


subject  to  Ap=  -c. 

•  Least-stpiares  multipliers. 

•  ProjtTtetl  Hessian  approxiniatiou. 

These  two  active-set  strategies  are  the  extremes  of  a  whole  range  of  possibilities.  Other 

mi'thods  have  bi'en  defined  that  use  features  from  both  approaches.  For  example,  the  method  of 


subjc'ct  to  Ap  >  -c. 

QP  multipliers. 

Full  Hi'ssian  appniximation. 
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Schittkowski  (1981)  solves  a  QP  subprobicm  of  the  form  (4.1),  but  uses  a  pre-assigned  working  set 
to  specify  which  of  the  constraint  gradients  should  be  nHroinputed  for  the  next  major  iteration. 
Similarly,  it  is  possible  to  treat  linear  constraints  with  an  EQP  approach  and  nonlinear  constraints 
with  an  IQP  approach  (sec  Gill  ct  al.,  1984d). 

It  is  important  to  note  that  IQP  methods  can  be  implemented  so  that,  as  the  solution  is 
approach(?d,  the  mnount  of  overhead  per  major  iteration  is  the  same  as  for  an  EQP  method  (i.e., 
solution  of  a  single  set  of  equations  of  the  form  (3.3)).  This  ciuj  be  achievtul  by  solving  (4.1)  with 
a  QP  method  that  allows  the  active  set  from  one  subproblem  to  be  used  to  initiali/.c  the  next. 
Since  the  active  set  of  the  subproblem  eventually  becomes  the  correct  active  set  for  the  noidincar 
problem,  QP  stibprohlcias  near  the  solution  reach  optimality  in  only  one  minor  iteration. 

4.3.  Active-set  strategies  in  quadratic  programming  methods.  Qinadnatic  programming 
methods  for  problems  with  ineqmility  constraints  solve  a  sequence  of  problems  (3.3)  in  which  the 
constraints  in  the  working  s<!t  arc  treated  as  equalities.  The  major  differences  among  QP  methods 
arise  from  the  numerical  procedures  for  solving  the  associated  linear  equations,  and  the  strategics 
that  control  changes  in  the  working  set.  (Modern  QP  methods  are  surveyal  by  Fletcher  (1981) 
and  Gill,  Murray  and  Wright  (1981).) 

We  briefly  outline  two  methods  of  selecting  constraints  for  the  working  set  when  solving  (4.1). 
At  <'ach  iteration,  let  p  €'uid  A  denote  the  current  estimates  of  the;  solution  and  optimal  multiplier 
v«Ttor,  and  let  r  denote  the  residual  vector  r  =  Ap  +  c.  The  “hat"  notation  indicates  quantities 
.issociahHl  with  the  working  s<.'t.  Th<!  vector  is  defincvl  such  that  p  +  is  the  .solution  of  a  QP 
with  tin?  <irigin.'d  obj<x'tivo  function,  subject  to  the  constraints  of  the  working  set  held  at  (sjuality, 
and  M  denotes  the  corresponding  clnuige  in  the  multiplier  estimate.  As  indicated  in  S(?«  tion  3.2, 
and  fiA  are  l.he  solution  of  (.he  linear  system 

wIkto  r/g  denotes  (j+  lip,  th<*  gradient  of  the  <{undratic  function.  The  algorithms  considered  here 
always  attempt  hi  move  from  the  minimuin  on  one  working  set  to  the  minimum  on  nnoth(?r  by 
l-aking  steps  of  tin*  form  p  +  /ijt  .-ui<i  A  +  AA.  However,  the  niainfen.'Uice  of  certain  properties  of  the 
working  s«'t  can  cans<’  a  shqi  «  (0  <»»<!)  to  be  hikeii,  wluTe  ir  <lep<>n<ls  upon  th<*  a<'tiv<'-s<’t 
str.'iti'gy  being  us«?<l. 

hi  an  artivi^set  feasible-point  QP  method,  p  is  feasible  (cj  >  0  for  all  i),  but  A  is  not  dual- 
feasible  (i.e.,  A<  <  0  for  at  least  one  i).  Changes  in  the  working  set  designed  to  imiiiitain 
f<‘asibility  of  p,  but  to  move  iiih'rior  to  constraints  that  have  negative'  Lagrange  multi[>li('r8.  At 
a  typical  it4?ratioii,  the  working  set  roin]>rises  the  constraints  satisfied  e'xactly  at  p  (i.e.,  r  =  0  in 
(4.2)).  If p  + remains  feasible  (i.e.,  ri-j-aft>/i  >  0  for  i  not  in  Ihe  working  se't),  then  the  full  step 
of  unity  is  permitted.  A  constraint  with  a  neg.ative  multiplii'r  (usually,  the  most  negative)  is  then 
deleteel  from  the  working  set.  Otherwise,  a  is  taken  as  the  smallest  step  such  that  the  n?sidual  of 
a  constraint  not  in  the  working  set  becomes  sero  at  n,  and  tin?  corresponding  constraint  is  added 
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to  the  working  set.  For  more  details  concerning  the  implementation  of  feasible-point  quadratic 
programming  methods,  see  Gill  and  Murray  (1978)  and  Gill  ct  iiI.  (1084a,  19851)). 

The  second  strategy  is  typical  of  thial-fcAsihlc  activo-set  methods.  In  these  methods,  p  is  not 
feasible  (i.e.,  some  r,-  <  0)  but  A  is  iilways  dual-fetisiblc*  (all  A,-  >  0).  Changes  in  tin*  working  sot 
are  designed  to  maintain  non-negative  multi]>ii<‘rs  while  moving  to  satisfy  the  violated  constraints. 
At  the  beginning  of  a  typical  iteration,  <ill  the  constraints  in  A  are  .satisfied  c'xactly  (except  one 
(i.e.,  r  is  a  multiple  of  a  unit  vector  in  (4.2)).  The  .step  lejigth  a  is  taken  as  t)n<!  if  p  +  ^  is  dual- 
feasible  (i.e.,  A,  -I-  5A,  >  0).  Otherwise,  a  is  clM>sen  as  the  largest  stop  such  that  A,  -|-  nSXi  =  0  for 
an  index  i  in  the  working  set,  and  the  corresponding  constraint  is  deleted  from  the  working  set. 
After  a  unit  step  is  taken,  a  constraint  with  a  lurgative  residual  (tisiially,  the  most  negative)  is 
added  to  the  working  set.  (Note  that  wo  have  given  a  considerably  siinplilic'd  descrii)tion  of  the 
dual- feasible  iteration  in  or<ler  to  cnipluisi/.e  the  similarities  between  dual-  mid  primal-feasible 
methods.  In  jiractice,  Sj)  and  ^A  are  not  found  directly  from  (4.2)  IxHraiiso  the  new  constraint  may 
be  depi'iideiit  on  the  constraints  already  in  the  working  sc't.)  For  further  information  concerning 
the  implementation  of  <lual-feasiblc  qiiailratic  programming  methods,  sec  Goldfarb  and  Idnani 
(1983)  and  Powell  (1983b). 

For  both  of  these  active-set  strategies,  each  change  in  the  working  set  leads  to  a  simple 
chiuige  to  A,  which  in  turn  lea<(s  to  a  cluuigc  in  the  factorizations  used  to  solve  (4.2). 

Doth  of  the  active-set  strategies  desi'ribed  nHjuire  an  initialization  prorrdtirc  to  obtain  .an 
initial  primal-  or  du<il-fe<isibl<'  ]>oiut.  As  nottnl  above,  for  efficii'iury  within  an  Sl^P  method,  it  is 
critic.al  that  this  jiroceduri!  should  be  .able  to  utilize  a  pre-.assigiu’d  working  set. 

The  initiali/.atiou  jirocedure  for  th<’  ]>riin<a1  femsible-point  mi'thod  is  eqiiiv.aleiit  to  .a  line.ar 
progr.auiming  ))roblem.  Consider  the  sum  of  iiife.-i.sibilitu's 

v(p)  =  -  Z)  («fP  +  c<)- 

aj'plc.<0 

Note  that  v(p)  is  ,a  liiii'ar  function  th.at  is  zero  .at  .any  fe.-isible  jiuint,  .and  jiosil.ive  at  .an  infejisiblc 
])oint.  Therefore,  a  fe.asible  point  can  Ix'  found  by  ininiinizing  v(p),  subjirt  to  continuing  to  s<atisfy 
the  constraints  with  positive  residuals  .at  p.  The  ruiiction  v[]>)  may  be  minimized  using  .an  act ivi*- 
si't  strategy  that  is  .almost  identic.al  to  th.at  of  the  re.asible-point  .ai  tive-set  method.  The  ]>rincip.'il 
differences  an;  th.at  the  search  dirc*ction  is  defined  jis  ~ZZ^Vt)(p),  .and  o  is  chosen  min(ai,  arj), 
wlK're  oi  is  the  in.aximum  stc'p  th.at  c.ui  be  taken  without  violating  one  of  the  constraints  that 
is  curn'iitly  .s.atisiied,  and  02  reaches  the  furthest  constr.aint  along  6j>  that  is  currently  viol.atwl. 
(Sever.al  violated  constr.aints  may  IxvoiiK'  satisfieil  (hiring  a  single'  iter.ation.)  For  efficiency,  the 
impleuient.at  ion  of  this  procedure  should  rellcTt  the  siinil.arity  of  the  linear  algebraic  conqiut.ations 
ass(M'i.at<>d  with  iterations  in  both  the  fi'asibility  .and  QP  pluases  in  p.art  icul.ar,  e.u'h  itx'r.ation 
involvc's  an  update  of  the  s.ame  factoriz<ation  of  the  working  set.  The  computations  in  both 
phases  in.ay  b<'  performed  by  exactly  the  s.ame  program  module's.  The  two-pluise'  mature  of  the 
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algorithm  is  reflected  by  changing  the  function  being  minimized  from  the  sum  of  infcasibilitics 
to  the  quadratic  objective  function.  An  important  feature  of  this  type  of  implementation  is  that 
if  the  pre-assigned  working  set  is  similar  to  the  active  set,  just  ci  few  changes  in  working  set 
arc  necessary  to  achieve  feasibility.  In  particular,  if  the  initial  point  is  feasible,  the  procedure 
merely  computes  all  the  relevant  factorizations  (which  arc  also  needed  for  the  QP  iterations)  and 
performs  a  feasibility  check. 

If  a  dual-feasible  active-set  strategy  is  usc<I,  the  following  initiidization  procedure  may  be 
employed.  The  procedure  is  based  on  finding  a  subset  of  the  pre-assigued  working  set  on  which 
the  multipliers  arc  positive.  First,  the  minimum  of  the  quadratic  on  the  pre-assigned  working  set 
is  computed  by  solving  (4.2)  with  p  =  0  and  A  =  0.  If  the  arc  non-negative,  the  initial  point 
can  be  taken  aa  p  =  6p  and  A  =  SX.  Otherwise,  a  constraint  with  a  negative  multiplier  is  deleted, 
the  factorizations  are  updated  and  (4.2)  is  solved  agiiin.  This  process  is  repeated  until  ttll  the 
multipliers  arc  non-negative  or  the  working  set  is  empty,  in  which  case  p  =  &p  and  X  —  6X  define 
the  required  initial  point.  (Note  that  the  unconstrained  minimum  is  trivially  dual  feasible.) 

An  alternative  initialization  procedure  is  to  start  at  the  unconstrained  minimum  and  give 
preference  to  adding  the  pre-assigned  constraints.  However,  if  the  i)re-assigned  working  set  is 
similar  to  the  active  set,  this  scheme  is  likely  to  require  more  work  than  the  procedure  above. 
First,  more  operations  itfc  required  to  compute  the  factorizations  by  ujxlating.  Second,  even  if 
the  prc-a8sigue<l  working  set  defines  the  optimal  fcttsiblc  point,  the  ntimbcT  of  QP  iterations  may 
not  be  equal  to  tlie  Klimeusion  of  the  optimal  working  s<'t,  since  it  cannot  he  guarantcH’d  that  the 
multipliers  will  remain  dmU-fciisible  during  (he  iuterincdiate  iterations. 

4.4.  Conditioning  of  the  working  set.  One  of  ll..  most  important  issui's  in  tlie  implemen¬ 
tation  of  (jP  algorithms  is  robust, iicss.  During  tlie  solution  of  a  nonlimvu’  problem,  tpiadratic 
suhproblems  of  wildly  varying  degrws  of  ilifliculty  are  generatwl  “aiitom.itically” .  Even  if  the 
original  noidiiiear  probh'in  is  w<rll-c<iiiditioned  in  the  lu’iglihorhood  of  tin*  .solution,  tlu'  QP  siib- 
])rohlems  of  the  early  iti'rations  may  be  very  bailly  behavcsl.  The  most  common  ilifliciilties 
inchule  singular  or  ne.arly  rank-d<'ru'i<'ut  .lacohians,  subprohlenis  wit  h  vi-ry  small  fe.-isihle  regions 
and  .si'verely  ill-conditioned  ITessian  iimtrices.  (P'or  examples,  se«<  SiTtion  5.) 

One  of  (Ik'  most  cridc.d  fe.'itnres  of  a  QP  iinpleiiieulatiou  is  (he  str.d.egy  for  in.'iiut.iiniiig 
a  well-conditioned  working  si>t.  The  s|ns'tr<'il  condition  iiuinher  of  A  provides  a  iiuvisure  of  the 
d«’gree  of  independence  of  l.lu’  constraints  in  the  working  s<*t.  This  numlx'r  (tlu'  rat  io  «>f  t  he  largest 
to  simUlest  singular  values  of  A)  will  ihxTcase  whc'n  a  constraint,  is  di'leted  from  the  working  set 
and  increase  when  a  constraint  is  .'uldeii.  The  worst  rase  occurs  when  the  new  working  set  is 
singular,  i.e.,  an  attimipt  is  ma«le  to  add  a  constraint  that  is  ilejiemlent  on  constraints  already 
in  the  working  sc't.  Howc'Vit,  if  a  near-<tepeiideiit  constraint  is  addetl  t.o  the  working  st't,  the 
I'oinlition  nuinher  may  incretise  suhst.'uit.ially.  Accordingly,  it  is  important  that  tin*  constraint- 
sehretion  procedure  should  considi’r  the  condition  nuniber  of  the  new  working  set. 

With  <'x.-M;t  arithmetic  ami  a  noii-siiigiilar  initial  working  set,  tin?  active-set  strategy  de- 
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scribed  above  for  the  primal-feasible  method  would  never  generate  a  singular  working  set.  To 
sec'  why,  recall  that  ASp  =  0  at  every  iteration.  Thus,  6p  will  never  intersect  a  constraint  that  is 
exfictly  linearly  d«!pendent  upon  A.  In  practice,  of  course,  the  difficulty  eirises  when  the  candidate 
constraints  are  nearly  dependent.  Determination  of  the  condition  number  requires  the  singular 
values  of  A,  which  would  be  too  expensive  to  coiiipiitc.  Instead,  a  QP  method  can  use  an  inex¬ 
pensive  condition  estimator  —  for  example,  the  ratio  of  the  largest  to  smallest  diagonals  of  the 
LQ  factor  L  (ne'e  (3.5)),  which  is  a  lower  bound  on  the  condition  number  of  A. 

Exercising  control  over  this  condition  estimator  turns  out  to  particularly  easy  in  a  primal- 
feasible  active-set  method,  if  it  is  acceptable  to  violate  constraints  by  a  small  tolerance.  Suppose 
that  each  constraint  has  an  associated  uscr-defiued  tolerance  that  specifics  the  m.aximmn  permis¬ 
sible  constraint  violation.  Lot  <lenoto  the  m<ucimum  step  at  which  p  -h  docs  not  violate 
any  constraint  by  more  than  its  feasibility  tolerjmce.  All  constrauits  at  distance  «  (a  <  a„) 
from  the  current  point  arc  then  vicwc<l  tis  eicccptablc  candidates  for  inclusion  in  the  working  set. 
A  criterion  that  we  have  found  to  be  particularly  successful  in  practice  (duo  to  Harris,  1973)  is 
to  add  tin?  constraint  whose  normal  makes  the  largest  angle  with  the  search  direction.  In  the 
case  whcr<?  the  null  space  of  A  is  of  <limension  one  (for  example,  in  the  simplex  method  for  linear 
programming),  this  choice  gives  the  smallest  condition  estimator  over  the  candidate  set. 

An  unsatisfactory  feature  of  the  Harris  scheme  is  that  all  the  constraints  active  at  the  so¬ 
lution  tend  to  be  violated  by  their  feasibility  tolerances,  oven  when  the  final  active  set  is  not 
ill-conditioned.  Howcv<;r,  this  idea  can  be  gcneraliz<!d  so  that  constraint  violations  by  S  arc  per¬ 
mitted  wlw'n  necessary  to  improve  the  conditioning  of  the  working  set,  but  an  .ittcmpt  is  also 
made  to  minimi/.e  the  constraint  violations.  With  this  strategy,  the  constraints  Jictive  at  the 
solution  tend  to  be  satisfied  exactly  rather  than  violated  (sec  Gill  et  al.,  1985b).  (An  interesting 
n’snlt  is  that  negativ<'  steps  are  soinetiiiies  necessary.) 


5.  Sample  runs 

In  this  section  we  shall  use  several  ('xamples  to  illustrjite  the  i)crformanco  of  iioiiliiK‘ar  program¬ 
ming  soft  ware  on  i)ra<’ti<-al  probh'iiis.  All  the  problems  were  solve<l  in  double  precision  on  an  IBM 
3081  using  the  VS  Fortran  coiiij)iler  with  uptiiiiirMition  level  3. 

5.1.  A  comparison  of  IQP  and  EQP  methods.  The  purpose  of  the  first  .set  of  runs  is  to 
illustrate  the  properties  of  nu'thods  based  on  the  EQP  and  IQP  ju;tivc-8ct  strategies.  Two  sptxific 
methods  for  linearly  constrained  optimization  arc  considered.  In  order  to  aid  the  comparison,  the 
iiK’thods  have  several  features  in  comm<»n.  Both  methods  recur  mi  orthogonal  factorization  of 
the  constraints  in  tlie  working  set  and  begin  by  conqmting  a  feasible  point.  Both  methods  treat 
simple  bounds  and  linear  constraints  separately. 

Tht!  first  method  is  a  standard  IQP  method  for  linearly  constrained  optimization.  The 
Hessian  of  each  QP  subproblem  is  a  positive-definite  DFGS  approximation  to  the  full  Hessian  of 
tlu’  obji'ctive  function.  At  each  iteration,  a  steplength  is  computed  th.at  satisfies  the  linesearch 
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conditions  (2.5a)  and  (2.56).  Each  QP  subproblcm  is  solved  using  a  feasible-point  active-set 
method  with  im  orthogonal  factorization  of  the  constraints  in  the  working  set.  The  fined  working 
set  from  the  QP  of  one  iteration  is  used  as  the  initial  working  set  for  the  next.  For  additional 
detmls  concerning  the  Fortran  implementation,  the  reader  is  referred  to  Gill  et  al.  (1984b). 

In  onr  discussion,  we  shall  refer  to  the  output  printed  during  the  run.  A  single  line  of  output 
is  printed  at  the  end  of  each  major  iteration.  The  major  iteration  number  is  given  in  the  first 
colunm  (markoil  “ITN”).  The  next  coliiiim  “ITQP”  gives  the  number  of  minor  iterations  needled  to 
solve  the  QP  subproblcm.  The  “STEP”  column  gives  the  step  at  taken  along  the  computed  search 
direction.  “NUMF”  is  the  total  number  of  evaluations  of  the  problem  functions.  “OBJECTIVE”  is 
the  value  of  the  objective  function,  F{xk)-  Coliunns  “BND”  and  “LC”  give  the  numbers  of  simple- 
bound  constraints  and  general  linear  constraints  in  the  working  set.  “)fZ”  is  the  dimension  of 
the  null  space  of  the  current  matrix  of  constraints  in  the  working  set.  The  next  five  entries 
give  information  about  the  derivatives  of  the  problem  at  the  current  point.  “NORM  GF”  is  the 
two-norm  of  the  free  components  of  the  objective  gradient  gjc,  and  “NORM  GZ”  is  the  two- norm  of 
^kSk-  “COND  H” ,  “COND  HZ”  and  “COND  T”  are  estimates  of  the  condition  numbers  of  the  Hessian, 
projected  Hessian  and  imatrix  of  constraints  in  the  working  set.  “CONV”  is  a  set  of  four  logical 
variables  Ci,  C2,  C3  and  C4,  used  to  inform  the  user  of  the  quaUty  of  the  current  estimate  of 
the  solution,  with  the  following  meanings.  Ci  is  true  if  the  projected-gradient  norm  is  small;  Cj 
is  true  if  constraints  are  satisfied  to  within  the  user-specified  tolerance;  Cj  is  true  if  the  signs  of 
the  multipliers  indicate  optimality;  and  C^  is  true  if  the  last  change  in  x  was  small.  Finally,  in 
some  of  the  runs  <ui  “S”  is  printed  iis  the  l<ist  itiun  of  the  iteration  summary.  This  indicates  that 
it  was  nec(;ssary  to  skip  the  BFGS  update  to  the  approximate  Hessian. 

The  scvoiul  iiiipleiiieiitatioii  is  basH  <in  cHi  EQP  artivivset  strategy  usimI  in  roujiiiirtion  with 
a  BFGS  apiiroxiiiiatioii  of  tlw;  pnijected  Ih's.siaii  Z'^HZ.  Tli<*  EQP  is  .solvc<l  using  the  orthogonal 
LQ  fiu'torization  mid  the  Cholcsky  factorization  of  the  pnijc'ctcd  Hesshm  (sch*  (3.3)  and  (3.5)).  If 
the  obji'ctive  function  is  ihx'reasiiig  at  the  step  to  the  nean'st  satisfied  constraint,  the  constraint 
is  julded  to  tli<?  working  set.  Lagrange  multqiliers  arc  computed  when  the  projectcxl-gradient 
norm  is  less  than  some  loose  tolerance.  If  the  smallest  multiplier  is  lu'gative,  the  corresponding 
constraint  is  deleted  from  the  working  set. 

Tlur  iterat  ion  siiininary  is  the  same  as  that  for  the  IQP  method  exce]>t.  that  only  thi'  condition 
(estimate  of  the  proj<'ctvil  H<?8siaii  is  print'd  (the  full  Hessian  is  not  recurreil),  and  ailditional 
information  is  printed  about  each  change  to  the  working  set.  When  the  status  of  a  constraint 
changes,  the  index  of  the  constraint  is  printeil,  along  with  the  dirsiguation  “L”  (lower  bound),  “U” 
(upper  bound)  or  “E”  (equality).  Indices  1  through  “N”  refer  to  the  bounds  on  the  variables,  and 
the  nunainiiig  indices  refer  to  the  general  constraints.  “KDEL”  and  “KADD”  denote  the  indices  of 
the  constraints  h'aving  and  eiitoring  the  working  8(>t.  If  mi  entry  in  one  of  these  columns  is  zero, 
a  constraint  was  not  delet'd  or  added.  “MIN  LM”  is  the  inultijilier  associated  with  the  constraint 
just  deleted.  If  no  constraint  was  deleted  during  the  rclevmit  iteration,  the  entry  in  this  column 
is  The  information  printed  in  the  “CONV”  colunm  is  different  from  that  given  in  the  IQP 
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method.  Ci  is  true  if  the  projected-gradient  norm  is  smaller  than  some  loose  tolerance;  Cq  is  true 
if  the  projected-gradient  norm  is  smaller  than  some  tighter  tolerance;  (7$  is  true  if  the  change  in 
the  objective  function  was  sufficiently  small;  and  C4  is  true  if  the  change  in  z  is  snnill.  Note  that 
the  loose  tolerance  on  the  projected-gradient  norm  must  be  satisfied  before  any  multipliers  are 
computed  and  any  constraint  is  deleted. 

Figure  1  gives  the  output  from  runs  on  a  wdl  behaved  seven-variable  linearly  constrained 
problem  with  .seven  general  linear  constraints  ixnd  upper  and  low(;r  bounds  on  the  variables.  At  the 
solution,  two  boiuids  and  three  general  Unear  constraints  are  active.  A  comparison  of  the  output 
from  the  two  runs  will  illustrate  one  of  the  major  differences  between  IQP  and  EQP  active-set 
strategies  —  the  different  pattern  of  changes  to  the  working  set.  hi  an  EQP  method,  the  problem 
functions  arc  computed  after  every  stop  along  a  search  direction.  Moreover,  a  constraint  wUl  be 
deleted  from  the  working  sot  only  when  the  current  point  is  considered  to  be  sufficiently  close 
to  the  minimum  on  the  current  working  set  (albeit  to  a  very  low  accuracy).  In  other  words,  a 
constraint  will  be  deleted  only  when  the  method  has  accumulated  sufficient  information  about 
the  curvature  of  the  problem  on  the  current  working  sot.  By  constrast,  many  constraints  may  be 
added  or  deleted  during  a  single  major  iteration  of  an  IQP  method.  In  the  nut  given  in  Figure 
la,  the  IQP  strategy  finds  a  very  close  approximation  to  the  correct  active  set  during  the  first 
subproblem. 
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During  the  final  few  iterations,  the  methods  are  essentially  identical,  with  a  single  minor 
iteration  being  performed  in  the  IQP  method.  Note  that  the  asymptotic  superlincar  convergence 
rate  is  evid<ajt  from  the  imit  steplcngths  in  the  “STEP”  coluum  and  the  sequence  of  converging 
“NORM  GZ”  entries.  Tins  column  may  be  used  to  verify  the  convergence  to  a  local  nuniinum  (see 
Gill,  Murray  and  Wright,  1981). 

5.2.  Are  IQP  methods  superior  to  EQP  methods?  It  is  a  popular  myth  that  the  more 
“opportunistic”  IQP  active-set  strategy  will  tend  to  find  the  correct  active  set  faster  than  an 
EQP  strategy.  The  next  two  problems  are  intended  to  demonstrate  that  this  is  nut  always  the 
case.  On  some  problems  the  IQP  method  will  be  faster,  on  others  it  will  be  slower. 

Li  Figures  2a  and  2b  we  give  the  IQP  and  EQP  results  for  the  minimization  of  the  six- 
variable  nouhucar  test  function  “Exp  6”  subject  to  simple-bound  constraints  upon  the  variables 
(see  Biggs,  1972b).  The  constraints  are  the  vector  of  simple  lower  bounds  (0.5, 9, 0.9, 4, 3, 2)  and 
simple  upper  bounds  (00,00,00,00, 4.9,  oo).  The  solution  lies  at  the  point  (1, 10, 1,5, 4, 3),  where 
no  simple  bounds  arc  active.  (For  brevity,  the  output  from  some  of  the  less  import<int  iterations 
has  been  omitted.) 

Exp  G  is  a  problem  for  wliich  the  IQP  method  is  substantially  faster  than  the  EQP  method. 
Since  no  bounds  arc  active  at  the  solution,  the  ability  of  the  quadratic  subproblom  to  drop  many 
of  them  during  one  major  iteration  allows  the  IQP  method  to  identify  the  correct  working  set 
rapidly.  By  the  seventh  iteration,  all  the  bounds  have  been  deleted  from  the  working  set.  An 
adequate  solution  is  identified  at  iteration  32.  By  contrast,  the  reipiirement  of  8om<!  significant 
reduction  in  the  projected  gradient  on  each  working  set  considerably  slows  down  the  rate  at  wliich 
the  EQP  mothod  ciui  delete  constraints.  It  is  not  until  iteration  55  that  the  correct  working  set 
is  identified. 

Cases  wluTC  the  more  conservative  strategy  of  the  EQP  method  gives  better  performance 
arc  illustrated  by  the  problem  “Weapon”  (see  Bracken  <uid  McCormick,  19C8).  The  objective 
function  is 

30  & 

<=i 

which  is  iaiiiiiiiiz<'d  subje<’l.  to  twc'lve  g<'u«'ral  liiie.-u*  constraints  and  bounds  on  all  the  variables. 
At  the  solution,  75  bounds  and  s(<ven  lineju'  constraints  are  active. 

When  appli<'d  to  this  problem,  both  methods  require  approxinnately  the  same  nuinber  of 
function  evaluations  (sra  Figures  3a  and  3b).  However,  the  EQP  method  needs  significantly  less 
CPU  time  to  obtain  the  solution  —  3.42  8<<cond8  compared  to  19.48  seconds  for  the  IQP  method. 
The  reason  for  this  discrcpmicy  is  that  the  EQP  method  nuuittains  a  much  better  approximation 
to  the  working  set.  While  moving  from  the  initial  feasible  point  (a  verti'x)  to  the  solution,  the 
EQP  method  niaintmns  a  projected  Ih'ssian  approximation  that  never  becomes  larger  tlnon  19 
(the  size  of  the  projt'cted  Hessian  at  the  solution  is  18).  This  performance  is  to  be  contrasted 
with  that  of  the  IQP  method,  for  which  th<*  diinetisioii  of  Z'^^HZ  increased  to  81  during  the  first 
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Figure  2a.  Results  of  an  IQP  method  on  the  six-variable  problem  Exp  6. 
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Figure  2b.  Results  of  an  EQP  method  on  the  six-variable  problem  Exp  6. 
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Figure  3a.  R('E<ultfl  of  an  IQP  iiiotluxl  on  the  Weapon  problem. 


two  iiutjor  it<-ratioiiti.  During  tin*  QP  mihprohh'iiiH  tin*  j»i>or  I]e.sHi.'ni  n{>pn>xijnation  mum's 
Um'  KJP  metiicxl  Jo  cUrlete  too  many  coiiKtrainy  from  the  working  m't.  Further  niimw  it^'r.ationa 
are  then  iice'eIwI  to  add  the  ronstrauits  that  were  deleted  nuiieceaHarily.  The  large  Elimensiou  of 
the  projc>cte<l  Hessian  also  inhibits  the  ability  of  the  method  to  recover  quickly.  Weapon  has  a 
Hessian  matrix  with  many  negative  and  scro  dgenvalncs.  If  there  are  not  enough  constraints 
in  the  working  set,  the  linesearch  function  is  likc'ly  to  have  negative  curvature  along  the  search 
(lirection  ami  the  (piasi-Newtoii  update  will  be  skipped.  If  this  occurs  for  sev«!rnl  consecutive 
iterations,  the  original  poor  curvature  inforuiatiou  is  left  unaltered  as  is  the  inability  of  the 
QP  to  predict  the  correct  active  set.  This  behavior  was  apparent  during  iterations  8-30  where 
no  quasi-Newton  update  coidd  be  performed. 


30 


Practical  Aspects  of  Nonlinear  Pragramming 


rm  JOCL 

JAM 

mp  HUMP 

OBJCCTZVf  OND 

LC 

HC 

wm  6Z 

HXN  in  COM  HZ 

COM  T 

CQHV 

• 

0 

0 

0.00-01 

1 

-1.9S390 

02 

91 

9 

0 

6.000-01 

•• 

1.0 

06 

4.D 

00 

T 

Trr 

1 

72L 

0 

t.OO  00 

2 

-S.154S0 

02 

90 

9 

1 

S.260  00 

-2.70  61 

1.0 

00 

4.0 

00 

T 

PIT 

t 

101U 

0 

1.00  00 

3 

-3.43720 

02 

96 

6 

2 

4.440  00 

-2.40 

01 

1.0 

00 

1.0 

00 

T 

PTP 

i 

47L 

0 

1.00  00 

4 

-S.24900 

02 

69 

6 

3 

S.47D  00 

-2.20 

01 

2.0 

00 

1.0 

00 

T 

PTP 

4 

97L 

0 

1.00  00 

s 

-4.64SOO 

02 

66 

6 

4 

4.460  00 

-1.90 

01 

2.0 

00 

2.0 

00 

T 

PTP 

s 

67L 

0 

1.00  00 

4 

-6.12370 

02 

67 

6 

5 

3.430  00 

-1.30 

01 

3.0  00 

2.0 

00 

T 

PTP 

4 

771 

941 

2.10-01 

7 

-0.42750 

02 

67 

6 

S 

7.900  00 

-1.30 

01 

00 

2.0 

60 

T 

PTP 

7 

441. 

44L 

1.40-01 

6 

-6.79370 

02 

67 

6 

S 

1.210  01 

-1.20 

01 

3.0 

00 

2.0 

00 

T 

PTP 

• 

0 

0 

1.00  00 

9 

-9.94360 

02 

67 

6 

S 

5.260  00 

— 

3.0 

00 

2.0 

00 

T 

PTP 

f 

621 

0 

t.OO  00 

to 

-1.10430 

03 

64 

6 

4 

3.000  06 

-1.30 

61 

3.0 

66 

3.0 

00 

T 

PTP 

IQ 

92L 

0 

1.00  00 

t1 

-1.20140 

03 

6S 

6 

7 

0.S30  OO 

-1.30 

01 

3.0 

00 

3.0 

00 

T 

PTP 

St 

22L 

0 

1.00  00 

21 

-1 .50630 

03 

78 

6 

14 

1.400  00 

-3.S0 

00 

4.0 

06 

3.0 

60 

T 

PTP 

3Q 

0 

0 

1.0D  00 

31 

-1.56920 

03 

7S 

6 

17 

1.SI0  00 

— 

3.0 

01 

3.0 

00 

T 

PTP 

M 

0 

67L 

S.50-01 

41 

-1.44140 

03 

76 

7 

IS 

1.110  00 

— 

1.0 

01 

3.0 

00 

T 

PTP 

M 

0 

0 

1.00  00 

SI 

-1 .493S0 

03 

77 

4 

17 

1.660  00 

•• 

1.0 

01 

3.0 

00 

T 

PTP 

40 

0 

0 

1.00  00 

42 

-1.71000 

03 

79 

7 

14 

4.340-01 

4.0 

01 

2.D 

00 

T 

PTP 

70 

0 

0 

1.00  00 

72 

-1.7IS20 

03 

76 

6 

14 

3.340-01 

3.0 

01 

2.0 

00 

T 

PTP 

40 

0 

0 

1.00  00 

62 

-1.71900 

03 

76 

6 

14 

2.490-01 

6.0 

01 

2.0 

00 

T 

PTP 

90 

0 

0 

1.00  00 

92 

-1.72420 

03 

76 

6 

14 

2.0SO-01 

-• 

3.0 

02 

2.0 

00 

T 

PTP 

too 

0 

0 

1.00  00 

too 

-1.72470 

03 

77 

6 

IS 

0.740-01 

2.0 

01 

2.0 

66 

T 

PTP 

110 

0 

0 

1.00  00 

112 

-1.72690 

03 

77 

4 

17 

2.100-61 

3.0 

01 

2.0 

00 

T 

PTP 

120 

0 

0 

t.OO  00 

122 

-1.73090 

03 

76 

4 

14 

1.690-01 

4.0 

01 

2.0 

66 

T 

PTP 

ISO 

0 

0 

1.00  00 

ISO 

-1.731S0 

03 

77 

4 

17 

1.690-61 

3.0 

01 

2.0  66 

T 

PTP 

IM 

0 

0 

1.00  00 

140 

•1.73220 

03 

79 

4 

It 

4.940-02 

— 

1.0 

02 

2.0 

06 

T 

PTP 

ISO 

0 

0 

t.OO  00 

ISO 

-1.73240 

03 

76 

4 

14 

5.960-02 

2.0 

02 

2.0 

00 

T 

PTP 

140 

0 

0 

1.00  00 

142 

-1.73290 

03 

76 

4 

14 

5.990-02 

•• 

7.0 

01 

2.0 

00 

T 

PTP 

170 

0 

0 

1.00  00 

172 

-1.73310 

03 

77 

4 

17 

5.220-02 

— 

6.0 

01 

2.0 

00 

T 

PTP 

160 

0 

0 

1.00  00 

162 

-1.73330 

03 

74 

4 

16 

4.320-02 

9.0 

01 

2.0 

00 

T 

PTP 

190 

73U 

0 

t.OO  00 

192 

•1.73350 

03 

7S 

4 

19 

2.270-02  -t.OO-02 

2.0 

02 

2.0 

00 

T 

PTP 

200 

0 

0 

1.00  00 

202 

-1. 73370 

03 

7S 

4 

19 

4.490*02 

2.0 

02 

2.0 

00 

T 

PTP 

210 

0 

0 

t.OO  00 

212 

•1.73440 

03 

7S 

7 

16 

5.490-02 

— 

2.0 

01 

2.0 

00 

T 

PTP 

220 

0 

0 

t.OO  00 

222 

-1.73470 

03 

7S 

7 

16 

2.340-02 

•• 

3.0 

01 

2.0 

00 

T 

PTP 

230 

0 

0 

1.00  00 

232 

•1.73460 

03 

74 

7 

19 

4.150-03 

— 

3.0 

01 

2.0 

60 

T 

PTP 

240 

0 

0 

t.OO  00 

242 

•1.73SOO 

03 

7S 

7 

16 

1.740-04 

4.0 

01 

2.0  60 

T 

TTP 

241 

0 

0 

1.00  00 

243 

•1.73500 

03 

7S 

7 

16 

S.620-OS 

••» 

4.0 

01 

2.0 

00 

T 

TTP 

242 

0 

0 

1.00  00 

244 

-1.73500 

03 

7S 

7 

16 

1 .450-05 

•• 

4.0 

01 

2.0  66 

T 

TTP 

243 

0 

0 

1.00  00 

24S 

•1.73500 

03 

75 

7 

16 

1 .930-04 

— 

4.0 

01 

2.0  60 

T 

TTT 

txxT  ue  pwac.  imfomi  •  •  rrn  ■  nfcvai  •  **t 


Figure  3b.  RthuIIs  of  an  EQP  uiolhod  on  the  Weapon  problem. 


It  is  difficult  to  predict  in  advance  whether  a  particular  problem  will  be  more  suitable  fi>r  an 
EQP  method  or  an  IQP  uiethotl.  IQP  iiK'thods  arc  likely  to  be  less  efficient  on  problems  for  which 
th<'  QP  multipliers  cluuige  rapidly  fr<iiii  one  itoratioii  to  the  ix'xt.  Problems  in  this  category  tend 
to  be  highly  nonlinear  or  to  have  many  siiiall  Lagrange  inulti]>Iiers.  In  either  c<ua',  the  significant 
changes  Ui  the  working  set  betw«vn  iteratioiis  st'riously  impair  the  ability  of  the  (piasi-Newton 
uptlale  to  build  useful  curvature  iiirorm.'ition  about  the  functioii.  (\>iiversely,  IQP  methods  will 
tend  U>  be  efficient  if  the  QP  multiplier  estiniatos  are  very  m’curato  when  compuU'd  at  points 
that  are  far  from  the  solution  (for  exam]tlc,  if  the  problem  were  close  to  being  quadratic). 

To  a  Large  extent,  the  relative  efficiency  of  IQP  and  EQP  methods  depends  upon  the  number 
of  constraints  active  at  the  solution.  EQP  methods  are  usually  implemented  so  that  as  many 
constraints  as  possible  arc  included  in  the  initial  working  set.  It  is  therefore  not  surprising  that 
they  torn!  to  be  more  efficient  when  more  constraints  are  active  at  the  solution.  Finally,  the  relative 
efficiency  of  a  method  is  critically  depemlcnt  upon  the  ratio  of  tli(>  amount  of  work  required  to 
perform  a  single  minor  iteration  compared  to  tlic  work  reqtiired  to  evaluate  the  problem  functions. 
As  this  ratio  increases  (ns  it  often  docs  as  the  sixe  of  tlie  problem  increases),  the  advantage  will 
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swing  towards  the  EQP  method. 

5.3.  Typical  performance  of  an  SQP  method  for  nonlinear  constraints.  The  romping 
runs  were  obtained  from  Version  2.1  of  the  program  NPSOL  (sec  Gill  ct  al.,  1984b),  an  IQP  quasi- 
Newton  method  for  nonliuestrly  constrained  optimisation.  The  Hessian  of  each  QP  subproblem 
is  a  positivo-deiiinte  BFGS  approximation  to  the  Hessian  of  the  Lagrangian  function.  The  QP 
subproblcni  is  solved  using  a  feasible-point  active-set  method  with  an  orthogonal  factorisation  of 
the  constraints  in  the  working  set. 

The  merit  function  used  in  NPSOL  is  a  smooth  augmented  Lagrangian  function  that  utilises 
the  properties  of  slack  variables.  The  inequality  constraints  of  NIP  can  be  reformulated  as  equality 
constraints  by  adding  simply-bounded  slack  variables  .vy.  Estimates  of  the  slack  variables  arc 
used  in  the  liiiesearch  to  give  a  smooth  augmented  Lagrangian  function.  At  each  major  itera*^'on, 
a  vector  triple  ^p,SA,Ss)  is  computed  that  serves  as  a  direction  of  search  for  the  variables  z, 
multiplier  estimates  A,  and  slack  variables  a.  (All  the  elements  of  the  vector  triple  are  avmlable 
from  the  solution  of  the  standard  IQP  subproblem  considered  in  Section  4.2.  The  vector  SA  is 
defined  as  p  -  A,  where  p  are  the  QP  multipliers,  and  the  vector  Ss  is  given  by  Ap+c—s.  Note  that 
the  QP  solver  does  not  need  to  treat  the  elements  of  s  as  additional  variables.)  The  stcplength 
is  required  to  produce  a  sufficient  decrease  in  the  augmented  Lagrangian  merit  ftinction 

£(i,A,s)  =  F(x}  -  A<(c,(z)  -  Si)  +  J  "  *•■)’• 

<-i  ^  i=i 

The  value  of  p  is  initially  set  to  sero,  and  is  occasionally  incrcastMl  from  its  value  in  the  previous 
iteration  in  order  to  (insure  descent  for  the  merit  fiiiictiou.  Thus  the  Be<|ueiice  of  jn'iialty  pa- 
raiiK'ters  is  g<ni<Tal]y  iion-decnvasiiig,  although  NPSOL  has  the  ability  to  rt'diice  the  value  of  the 
pt'iialty  parameter  a  limited  number  of  times. 

The  iter.atioii  summary  printed  in  each  of  Figures  4  0  is  identical  to  that  providecl  by  the 
linearly  constrained  IQP  method,  except  that  the  merit  function  value  (“MERIT'’)  is  printed 
instead  of  the  «>bjective  value,  and  the  atlditional  columns  “NC”,  “NORM  C”  and  “RHO”  give  the 
number  of  nonlin<*nr  constraints  in  the  working  set,  the  iw«>-nonn  of  the  n'siduals  of  constraints 
in  tlie  working  s<>t  and  the  pemtlty  |uu‘aineter  iistHl  in  the  merit  function.  Li  all  of  the  NPSOL 
runs,  the  h'.'isibility  loleraiire  for  each  nonliiiecur  roustnunl  was  set  at  10 

Two  runs  wtre  selectcHl  to  illustrate  the  behavior  of  mi  IQP  method  when  solving  wdl 
behavcKl  (but  non-trivial)  nonlinear  problems.  Figttfe  4  gives  the  results  obtaint'd  on  a  version  of 
the  Hexagon  problem.  (For  more  dctmls  of  tliis  problem,  see  Wright,  1076.  A  slightly  different 
formulation  is  given  as  Problem  108  by  Hock  and  Schittkowski,  1081.)  Ih'xagon  is  a  popular  test 
problem  for  nonlinear  programming  im^thods.  All  cunstrmnt  types  art'  inchuhnl  (bounds,  linear, 
noiiliiiear),  mid  the  Htnsimi  of  the  Lagrmigian  hmctioii  is  not  positive  definite  at  the  solution.  The 
problem  has  nine  variables,  finite  bounds  on  six  of  the  variables,  four  general  linear  constraints, 
mid  fifteen  nonlinear  constraints.  Six  nonlinear  constraints  arc  active  at  x. 
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The  problem  solved  in  Figure  5  is  derived  from  a  30-bus  optimal  power  flow  (OFF)  problem 
of  o])timizing  the  distribution  of  electrical  power  over  a  network.  The  problem  has  67  variables, 
60  nonlinear  constraints,  and  upper  and  lower  bounds  on  all  of  the  variables.  At  the  solution,  54 
nonlinear  constraints  and  three  simple  bounds  arc  active. 
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Figure  4.  Output  from  tlu'  solution  of  the  w<?l]-bclmved  problem  Hexagon. 


On  these  two  well  behaved  problems,  the  approximate  Hessian  and  working  set  remain  rela¬ 
tively  well-coiuIitioiKMl.  Similarly,  tlu*  penalty  parameters  remain  small  and  appn>ximatcly  con- 
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stant.  The  two  runs  illustrate  much  of  the  numerical  bdiavior  of  a  quasi-Newton  IQP  method 
that  is  predicted  from  theoretical  analysis.  As  xjc  approaches  the  solution,  just  one  minor  iteration 
is  performed  per  m^or  iteration,  and  entries  in  the  “NORN  GZ”  and  “NORM  C”  columns  exhibit 
the  superlincar  convergence  rate  discussed  in  Section  3.3.  Note  that  the  constraint  violations 
converge  earlier  than  the  projected  gradient.  The  final  values  of  the  projected  gradient  norm  and 
constraint  norm  reflect  the  limiting  accuracy  of  the  two  quantities.  It  is  possible  to  achieve  almost 
full  precision  in  the  constrmnt  norm  but  only  half  precision  in  the  projcctcd-gradient  norm. 
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Figure  5.  Output  from  the  solution  of  the  OFF  problem. 

The  status  and  values  of  the  variabkn  juid  constraints  at  the  final  solution  give  ust^fiil  infor¬ 
mation  about  the  progress  of  a  miuiinisiitinii  and  the  degree  of  difficulty  of  the  problem.  Figure 
4  includes  the  final  solution  output  from  NPSOL  for  Hexagon.  The  printout  is  divided  into  three 
sections,  giving  information  about  the  final  status  of  the  variables,  general  linear  constraints  and 
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nonlinear  constraints,  respectively.  Within  each  section,  “STATE"  gives  the  status  of  the  associ¬ 
ated  constraint  in  the  predicted  cuttivc  set  (FR  if  not  included,  EQ  if  a  fixed  v<duc,  LL  if  at  its  lower 
bound,  and  UL  if  at  its  upper  bound;.  ifALUE”  is  the  value  of  the  constraint  at  the  final  iteration. 
“LOWER  BOUND”  and  “UPPER  BOUND"  give  the  lower  and  upper  boimds  specified  for  the  constraint 
(“NONE”  indicates  that  no  bound  is  enforced).  “LAGR  MULTIPLIER”  is  the  value  of  the  Lagrange 
multiplier.  This  will  be  zero  if  STATE  is  FR.  The  multiplier  is  non-negative  if  STATE  is  LL,  and 
non-positive  if  STATE  is  UL.  “RESIDUAL”  gives  the  difference  between  the  entry  in  the  “VALUE” 
column  and  the  nearer  bound. 

In  the  first  section,  “VARIABLE”  is  the  name  (VARBL)  and  index  of  a  variable.  In  the  lin¬ 
ear  constraints  section,  “LINEAR  CONSTR”  is  the  name  (LNCON)  and  index  of  a  linear  constraint. 
“NONLNR  CONSTR”  is  the  name  (NLCON)  and  index  of  a  nonlinear  constraint. 

Note  that,  although  the  feasibility  tolerance  for  the  nonlinear  constraints  is  of  the  order  10~*, 
the  fiiuU  accuracy  is  considerably  better  than  this.  This  is  because  the  constraint  violations 
are  being  refined  during  the  last  few  iterations  while  the  algorithm  is  working  to  reduce  the 
projected-gradient  norm.  Another  feature  worth  noting  is  that  the  the  constraint  values  and 
Lagrange  midtipliers  at  the  solution  are  “well  balanced”.  For  example,  all  the  multipliers  are 
approximately  of  the  same  order  of  magnitude.  This  behavior  is  typical  of  a  well-scaled  problem. 

5.4.  Performance  on  an  ill-conditioned  problem.  Fimally,  we  give  the  results  of  the  IQP 
method  on  the  problem  Dembo  7.  This  problem  is  a  geometric  programming  formulation  devel¬ 
oped  by  Dembo  (197G)  of  a  five-stage  membrane  separation  process.  The  problem  has  sixteen 
VjU’iabh's,  eight  linear  constraints,  and  eleven  nonlinear  constraints.  All  sixteen  variables  have 
siin])le  u]>por  and  lower  botuid  constraints.  The  problem  causes  many  difiicultics  for  a  nonlinear 
progratiiiuing  algorithm  because  of  ba<l  scaling  and  linearly  dependent  constraints. 

The  restilts  for  Dembo  7  show  a  number  of  featm-<*s  that  arc  common  to  badly  behaved 
problems.  First,  note  that  the  number  of  minor  iterations  does  not  decline  quickly.  Moreover, 
the  i)resence  of  ncar-z<>ro  Lagr.ange  multipliers  sometimes  causes  the  QP  t.o  require  more  than 
one  iti'ration  relatively  close  to  the  solution.  A  very  common  .symptom  of  a  badly  behaved 
probh'iii  is  th<'  large  value  of  the  condition  estimator  of  th«'  full  approxiin.'ile  Ih*ssian,  which  is 
to  be  contr;ist<’<l  with  the  ndatively  nKMh<st  v.alue  i»f  the  condition  of  the  projrcU'il  Hessian.  This 
observation  luus  s«»nie  n’levaiice  to  the  choice  «if  method  for  the  tJP  sul>|>roblein.  dlejirly,  special 
care  must  be  taken  when' iinpleiiienting  ;uiy  QP  method  that  recpiirra  th(<  fiu-tors  of  the  full 
Hessian  (ns  o])pos(Hl  to  the  proj<H'ted  Hessian).  Starting  the  minor  iterations  at  an  unconstrained 
miiiiinum  of  the  QP  subproblem  will  res\dt  in  very  large  values  of  6p  (see  Section  4.3). 

Note  that  the  third  bound  constr.aint,  the  third  linear  constraint  and  eleventh  nonlinear  con¬ 
straint  .all  have  very  sm.all  n»ii<luals  but  are  not  in  the  working  set.  The  vdiics  of  the  nonlinear 
con.str.aints  in  th(?  working  set  vary  sigiiificaiitly  in  ordi;r  of  nuagnitiide,  imlicating  that  the  con¬ 
straints  .are  badly  sc.aled.  In  constr.ast  to  the  solution  of  Hextagon,  in  which  the  accuracy  of  the 
constr.aints  was  much  better  than  required  by  the  convergence  tolerance,  some  of  the  nonlinear 
constraints  arc  only  just  satisfied  to  the  requirt'd  feasibility  tnlerance. 
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Figure  6.  Output  from  the  (lohition  of  Dembo  7. 
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Finally,  we  wish  to  emphasize  that,  despite  severe  ill-conditioning  in  the  Hessian  of  the 
Lagnuigian  and  serious  dependencies  among  the  constraints,  Dembo  7  is  solved  in  a  relatively 
routine  manner.  Dependent  constraints  are  successfully  omitted  from  the  working  set  in  such  a 
way  that  its  condition  estimator  never  gets  much  larger  than  10^.  Moreover,  the  final  convergence 
rate,  although  not  supcrlincar,  is  quite  rapid. 
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